DIMENSIONAL MODELLING



Similar documents
Designing a Dimensional Model

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Dimensional Data Modeling for the Data Warehouse

Optimizing Your Data Warehouse Design for Superior Performance

Week 3 lecture slides

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

The Benefits of Data Modeling in Business Intelligence

Upon successful completion of this course, a student will meet the following outcomes:

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

A Design and implementation of a data warehouse for research administration universities

How To Model Data For Business Intelligence (Bi)

Data Warehousing and Data Mining

OLAP Theory-English version

LEARNING SOLUTIONS website milner.com/learning phone

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Data Warehouse Design

New Approach of Computing Data Cubes in Data Warehousing

MICROSOFT DATA WAREHOUSE IN DEPTH

Dimensional modeling for CRM Applications

Microsoft Data Warehouse in Depth

Part 22. Data Warehousing

Fluency With Information Technology CSE100/IMT100

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Chapter 7 Multidimensional Data Modeling (MDDM)

1. Dimensional Data Design - Data Mart Life Cycle

A Service-oriented Architecture for Business Intelligence

Analysis Services Step by Step

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Advanced Data Management Technologies

University of Gaziantep, Department of Business Administration

When to consider OLAP?

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

IST722 Data Warehousing

Logical Data Model for Retail Banking

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

CHAPTER 3. Data Warehouses and OLAP

14. Data Warehousing & Data Mining

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Basics of Dimensional Modeling

DATA WAREHOUSE / BUSINESS

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Lection 3-4 WAREHOUSING

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

DATA WAREHOUSING AND OLAP TECHNOLOGY

MDM and Data Warehousing Complement Each Other

CHAPTER 5: BUSINESS ANALYTICS

Data warehouse design

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Customer-Centric Data Warehouse, design issues. Announcements

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Multidimensional Modeling - Stocks

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

SAS BI Course Content; Introduction to DWH / BI Concepts

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Data Warehousing, OLAP, and Data Mining

Justice Data Warehousing and Court Business Intelligence. Technical Introduction. Harris County Courts

SQL Server 2012 Business Intelligence Boot Camp

Course Design Document. IS417: Data Warehousing and Business Analytics

DATA WAREHOUSING - OLAP

Implementing Data Models and Reports with Microsoft SQL Server

Data Warehousing Concepts

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

The IBM Cognos Platform

Dimensional Modeling 101. Presented by: Michael Davis CEO OmegaSoft,LLC

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

CHAPTER 4: BUSINESS ANALYTICS

BUILDING A HEALTH CARE DATA WAREHOUSE FOR CANCER DISEASES

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data W a Ware r house house and and OLAP Week 5 1

Kimball Dimensional Modeling Techniques

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Dimensional Modeling for Data Warehouse

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

Sterling Business Intelligence

Data W a Ware r house house and and OLAP II Week 6 1

SAP S/4HANA Embedded Analytics

Data Warehousing Systems: Foundations and Architectures

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Transcription:

ASSIGNMENT 1 TO BE COMPLETED INDIVIDUALLY DIMENSIONAL MODELLING Describe and analyse the dimensional modelling (DM) design feature allocated to you. (The allocation of a design feature to a student will be presented at the end of the second lecture, F2.) Have in mind the questions given below when working throw it. A 500 words report summarising your answers shall be handed in electronically in FirstClass, the S1 conference belonging to the corresponding group (i.e., S1 under Grp1, Grp6 conferences), latest one hour after the seminar. A number of the presentation shall randomly be selected and evaluated. Do not forget to write you mane in the beginning of the report. Everyone shall present his/her work at the first seminar. Be prepared to give five minutes long pedagogical presentation and bring with you the necessary material for doing this. To achieve a maximal interaction and activity, the presentations shall be performed on a student-to-student basis (and not in a traditional studentto-class basis). Every student shall during a 10 minutes slot, meet a classmate during which he/she will present and discuss his/her work as well as listen to his mate s presentation. This process will be repeated so that every student will have the opportunity to present his/her work to everyone else with different topic in the group. Reading requirements [K&R] Questions A) Explain the design feature, its key characteristics, and the various ways of representing it in a DM. B) Explain the relevance of the design feature to any relevant aspects of OLAP functionality (drill-up, drill-down, browsing, choice of focal measures, etc). C) Explain why this design feature is used when creating multidimensional models. D) Explain the relevance of the DM design feature to the efficiency of data storage and response times of queries against the database. DM design features 1. Fact table including additive and semi-additive and non-additive facts 2. Surrogate keys 3. Granularity, Transaction level fact tables, Periodic Snapshot fact tables, 4. Accumulating snapshot fact tables 5. Snowflaking, pros and cons. 6. Degenerate and junk dimensions 7. Demographic dimensions and Mini-dimensions 8. Causal dimensions 9. Value-chain, Bus-architecture, Conformed dimensions 10. Slowly-changing dimensions 11. The concept of time within data warehousing 12. The customer dimension, customer hierarchies

ASSIGNMENT 3 PAPER PRESENTATION TO BE COMPLETED IN GROUPS Each group shall select and summarise a paper from the data warehousing area. Each group shall give a presentation of the selected paper on Seminar 3. The presentation should be well prepared. The use of OH slides is recommended. Presentations should take approximately 15 minutes. Please, make pedagogical presentations. A written report, i.e., extended abstract of the paper, of approximately 600-1000 words shall be handed in at the seminar together with the original paper. We recommend you to select papers from scientific periodicals or conference proceedings (you can find a number of these in the library), but even the Web can be a good source. The size of the paper you select shall be approximately 5000 words (which is 10-14 pages). You must be able to motivate your choice. It is allowed for two different groups to select the same paper, as far as they attend different seminar groups. To avoid confusion, each group shall publish in FirstClass (under the conference for the corresponding seminars and seminar group, i.e., Grp1 S3,,Grp6 S3,) the reference to the paper they have selected in order to book this paper, so that no other group after that shall make the same choose. EXTRA ASSIGNMENT (OPTIONAL) MASTER THESIS SUBJECT PROPOSAL TO BE COMPLETED INDIVIDUALLY NB! This assignment is optional. It shall be completed by students who are intending to achieve higher marks for the course (VG for SU students, and 4 or 5 for KTH students). The results of this assignment shall be a written report of about 2500 words. The report shall consist of a 1) brief background where 2) a problem within the data warehouse domain is clearly outlined and described, 3) a suggested method for work in order to solve this problem, as well as argumentation for the choice of this method shall be provided, 4) a related research section, build on at least five scientific publications, convincing the reader for the relevance of the proposed work, and 5) a reference list. The assignment, i.e. a paper copy of the report, together with copies of the scientific publications it is based on shall be handed in latest the day before the written exam. An electronic copy of the report in MS-word format shall be submitted to the corresponding conference in FC. The evaluation of this assignment will include both a scientific evaluation of the proposed thesis subject, as well as an evaluation of the presentation quality of the report.

ASSIGNMENT 2 - DATA WAREHOUSE DESIGN TO BE COMPLETED IN GROUPS Construct a multidimensional model for the banking mini-case described below. All groups shall present their solutions at seminar 2 and engage in active discussion after the presentations Documentation requirements Written documentation containing the following shall be handed in 24 hours before the seminar: A set of diagrams depicting the star-join schemas needed to solve the mini-case. Written descriptions of how the following issues were dealt with in the design: Heterogeneous products Aggregations The socio-demographic mini-dimension (i.e. demographic mini-dimensions) Slowly changing-dimensions The customer/account relationship Diagrammatic description of the hierarchies in the time dimension. Do not forget to write your names, group number, and assignment number. A BANKING MINI-CASE Access Banking AB is a small niche bank offering a limited range of banking products to private customers, they do not have any companies as customers. Decision-makers in the bank are at present working with a number of product development projects and want to get better feedback on customers preferences for the bank s various products and banking functions. Previous product development initiatives have been aimed at providing new value-adding functionality to the bank s two main products namely payment transferral services and current accounts with deposit and withdrawal functions. Each product has a set of basic banking functions associated with it. In addition to this customers can choose value-adding functions on the basis of their banking needs. This allows customers to choose their own product configurations depending on the set off value-adding functions they pick. Decision-makers want to check how customers utilise the various banking functions in order to discover possible trends in customers preferences for these functions. The decision-makers are also of the opinion that infrequent transactions with large amounts of money are better for the bank as this allows them to cut down on the expensive operation of processing transactions. They would like to ensure that they only develop new banking functions, which steer customers towards this more profitable pattern of behaviour. For all products the decision-makers want to see how much money is involved in each banking transaction and the number of times a certain customer has used each product configuration. Even the use of the individual banking functions in a product configuration are of interest to the decision-makers. For the purposes of time series analysis the decision-makers feel they need to view information at the most detailed level possible (i.e. individual transactions). They would like to be able to aggregate measures to days, weeks, months, quarters, tertiaries, half-years, and years. There is however also a fiscal year that is shifted by one month from the standard calendar year. The fiscal year starts on the 25 th of January and the months in the fiscal year are numbered one to twelve. For the fiscal year the decision-makers would like to aggregate over months, quarters, tertiaries, and years. Traditionally the bank has had an account oriented approach and it is now felt that they would like to assume a more customer oriented approach in their analysis. Behind this lies the assumption that the preferences for certain products is based nearly entirely on customers socio-demographic attributes. Unfortunately the bank s source data only links transactions with accounts. To further complicate the issue each customer can have several accounts and each account can be held by several customers. The information on which customer owns which accounts is however captured in the bank s well maintained customer register. The bank has a highly mobile customer stock, most of them working in large multi-national corporations and being frequently relocated to various subsidiaries. In addition to this the rate of product innovation in the bank is

high and new banking functions are released on a regular basis. The bank aims to have a rapidly evolving product offering in order to meet the heightened competition in the banking business. The decision-makers are aware of a set of new tools on the market, which they plan to exploit for exploratory analysis of their data. In order to do so they intend to first build a small data mart with high quality data extracted from the operational databases in the bank. Socio- demographic information on customers will be collected from an external information vendor. The design of the data mart will be based on star-join schemata. BASIS REQUIREMENTS ON THE DESIGN Your assignment is to design a set of star join-schemas that will provide the decision-makers with the information they need in the product development process. A number of factors need to be taken into consideration when designing the star-join schemas: The bank s products are essentially heterogeneous. They can be divided into three main product types (see listing of entities and their attributes), where each product type has its own set of basic banking functions and a set of optimal functions. Decision-makers are not always going to be interested in comparing attributes of different product types against each other. They will instead want to focus on each product type individually when performing exploratory analyses. It is only at the product type level that they will need to make any comparisons between the three. Ensure that you optimise the design of the star-join schema so that optimal browsing performance is provided to the decision-makers when they analyse the bank s heterogeneous product range. The complex many-to-many relationship between account and customer must be taken into consideration if the decision-makers are going to be able to perform customer-oriented analyses. In order to give any sort of relevance to the historical analyses analysts must be able to see when customers opened an account and when they closed it. All this information will in addition have to be linked to the transaction record for each customer. This will also allow the bank to see if the introduction of new product functions has attracted new customers. Ensure that the star-join schemas depict the relationship between customer and account as well as the history of this link. Motivate your solution and explain how the drill-across functionality needed to make this link is supported by your design solution. As mentioned above customer and product attributes change slowly over time. Decision-makers want to be able to guarantee the historical relevance of all data and accurately partition time on the basis of product and customer changes. Ensure that the design of the star-join schemas take this into account and explain which strategy you will adopt to deal with these slowly changing dimensions. Customers have many attributes but it is the socio-demographic ones will probably be of most interest to decision-makers when they are browsing in their OLAP tools. Ensure that the star-join schema includes a mini-dimension for the purposes of quickly aggregating on the basis of customers socio-demographic attributes. Motivate your design solution and explain in which form the socio-demographic attributes must be presented in the mini-dimension. Another requirement that decision-makers have is that they can quickly aggregate on the level of product type and month. Ensure that your design of the star-join schemas allows for the pre-aggregation of facts at this level so that they can quickly access the information they need. Motivate your choice of design solution. Finally, ensure that the time dimension supports all the hierarchies needed to support the decision-makers requirements for aggregations when performing time series analyses. Depict these hierarchies in a treestructure in order to clarify their structure. LISTING OF ENTITIES AND THEIR ATTRIBUTES The following are a list of all the main entities which can be used in the star-join schemas. Fields for the different fact tables and dimension tables must be selected from the list below. It may be necessary to create derived facts which are not included in the list below but which can be calculated from the bank s transaction history.

Customer Customer number (unique) First name Second name Date of birth Street address Postal code Postal area Communal code Country code Behavioural indicator Education level Net worth to bank Occupation Gender Dependence Marital status Home ownership status Income Individual life cycle status Customer segment Contact person in bank Contact unit in bank Account Account number (unique) Account lifecycle status Date of opening Date of termination Date last modified EDA code Account status Account category Branch Account type Balance Product Product type Product type description Product responsibility in bank Product type lifecycle status The three major product types are described below A) Product type: Deposit and withdrawal facility with overdraft option Basic functions Checking function type ATM access type Credit card function type Overdraft limit Overdraft interest rate Value added functions Tele-bank function Internet bank function Quick checking International checking International ATM access Extra card option International cash insurance International lost card insurance Advanced accounting and reporting B) Product type: Deposit and withdrawal facility without overdraft option Basic functions Checking function type ATM access type Credit card function type Value added functions Tele-bank function Internet bank functions Quick checking International checking International ATM access Advanced accounting and reporting Accounting and reporting on diskette

Home budget management Type C) Product type: Payment transferral (Giro) Basic functions Account-to-account transfer type Account-to-bank transfer type Account withdrawals via bank Account deposits via bank Value added functions Tele-bank function Internet bank function Quick clearance Payment monitoring Tax payment monitoring Advanced accounting and reporting Accounting and reporting on diskette Payment transactions Transaction number (unique) Amount Date Initiating account Product utilised