Lectures for the course: Data Warehousing and Data Mining (406035)

Similar documents
Dimensional Data Modeling for the Data Warehouse

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

DATA WAREHOUSING - OLAP

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

DATA WAREHOUSING AND OLAP TECHNOLOGY

Designing a Dimensional Model

Data W a Ware r house house and and OLAP II Week 6 1

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Week 3 lecture slides

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP Systems and Multidimensional Expressions I

New Approach of Computing Data Cubes in Data Warehousing

Data Warehouse: Introduction

When to consider OLAP?

Building Data Cubes and Mining Them. Jelena Jovanovic

Part 22. Data Warehousing

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Data Warehousing, OLAP, and Data Mining

Optimizing Your Data Warehouse Design for Superior Performance

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

OLAP OLAP. Data Warehouse. OLAP Data Model: the Data Cube S e s s io n

Monitoring Genebanks using Datamarts based in an Open Source Tool

Fluency With Information Technology CSE100/IMT100

Microsoft Data Warehouse in Depth

Data Warehouse Design

Chapter 3, Data Warehouse and OLAP Operations

CHAPTER 4 Data Warehouse Architecture

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

SQL Server Analysis Services Complete Practical & Real-time Training

A Technical Review on On-Line Analytical Processing (OLAP)

Data Warehousing & OLAP

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Data Warehousing and OLAP

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Week 13: Data Warehousing. Warehousing

DIMENSIONAL MODELLING

14. Data Warehousing & Data Mining

Data Warehousing and Data Mining. A.A Datawarehousing & Datamining 1

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

CS2032 Data warehousing and Data Mining Unit II Page 1

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

SAS BI Course Content; Introduction to DWH / BI Concepts

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Knowledge Discovery in Databases. Databases. date name surname street city account no. payment balance

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Business Intelligence, Data warehousing Concept and artifacts

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Basics of Dimensional Modeling

Data Warehousing (DW) Online Analytical Processing (OLAP) Data Mining

CS1011: DATA WAREHOUSING AND MINING TWO MARKS QUESTIONS AND ANSWERS

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing and OLAP Technology for Knowledge Discovery

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.


Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

What is OLAP - On-line analytical processing

Hybrid OLAP, An Introduction

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

UNIT-3 OLAP in Data Warehouse

OLAP Theory-English version

A Critical Review of Data Warehouse

University of Gaziantep, Department of Business Administration

Mario Guarracino. Data warehousing

Distance Learning and Examining Systems

Chapter 20: Data Analysis

OLAP and Data Warehousing! Introduction!

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Data Warehousing and Data Mining

Overview of Data Warehousing and OLAP

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

On-Line Application Processing. Warehousing Data Cubes Data Mining

LEARNING SOLUTIONS website milner.com/learning phone

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Turkish Journal of Engineering, Science and Technology

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Business Intelligence

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

II. OLAP(ONLINE ANALYTICAL PROCESSING)

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Building an Effective Data Warehouse Architecture James Serra

IST722 Data Warehousing

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Data Mining for Knowledge Management. Data Warehouses

Decision support systems are the core of business

Transcription:

Lectures for the course: Data Warehousing and Data Mining (406035) Week 1 Lecture 1 Discussions on the need for data warehousing How DW is different from OLTP databases Week 2 Lecture 2 Evaluation norms were announced Class Test date was announced Expectations from Term paper and Term Project were announced What is a data warehouse? Lecture 3 Why a data warehouse is required? Components of a data warehouse Source System, Data Staging Area, Presentation server and User Interface. Lecture 4 (a) and (b) 3 tier Data warehouse architecture OLAP ROLAP, MOLAP and HOLAP Multidimensional Data Model (MDM) 2-D and 3-D representations of Data in MDM Slicing, Dicing, Roll-up and Drill down OLAP operations Hierarchy in a dimension Week 3 Lecture 5 OLAP operations re-visited Concept Hierarchy Schema Hierarchy and Set-grouping Hierarchy defined Total order and partial order among concept hierarchy levels explained Roll-up and Drill down using concept hierarchy and dimension reduction

Lecture 6 Data Cube defined as a lattice of cuboids DMQL statement for cube definition Types of measure explained Distributive, Algebraic and Holistic Lecture 7 (a) and (b) ERD and normalization revisited Dimensional Modeling Fact tables and dimension tables De-normalization and its effect on data warehouse design Report generation from Dimensional Model Star schema and Snowflake Schema - Definitions How star schema provides symmetric entry into the fact table from the dimension tables Week 4 Lecture 8 Fact table and Dimension table revisited Size estimate of Fact and Dimension tables Four main steps in Data warehouse design Identify business process, Define grain, Identify dimensions and Identify facts Data marts Flexibility of dimensional models How dimensional model can handle new measures and new dimensions in the Fact tables. How old records are updated with default values for new columns New attributes in the dimension table Details of date dimension Gray et al paper circulated Lecture 9 Snowflake and Fact constellation schemas Factless fact table Degenerate dimensions Sparse fact tables Multiple fact tables with new dimensions Addition of new dimensions in existing fact tables e.g., freq_cust_id in sales fact tables Retail Sales fact table and Promotion Fact tables How they operate together

Week 5 Lecture 10 Recap of old concepts Star schema, Snowflake schema, Fact constellation, Factless fact table De-normalized fact table Introduction to Inventory Business process Lecture 11 Inventory Periodic Snapshot Fact table schema Additive, Semi-additive and Non-additive facts Size estimate of periodic snapshot schema Limitations of periodic snapshot schema Introduction to inventory transactions Lecture 12 (a) and (b) Inventory Transactions Inventory Accumulating Snapshot Week 6 Lecture 13 Data Warehouse Bus Architecture Data Warehouse Bus Matrix Conformed Dimensions Identical Table, Sub-set Dimension and Roll-up Dimension Lecture 14 Slowly Changing Dimensions Type 1, Type 2, Type 3 and Type 6 response Customer Relationship Management Data Mart Aggregated and Segmentation Attributes of Customer Dimension Rapidly Changing Large Dimensions Outrigger Dimension Minidimension Effect of Minidimension on Dimension-only queries First Class Test was held here Week 7

Lecture 15 Recap of Changing Dimensions Slowly changing and Rapidly Changing Dimensions Lecture 16 E-Commerce Data Warehouse Basics of Browser-Web Server Interaction Clickstream Analysis and Web Log as Source of Information Challenges in using web log data Identification of Visitor Source, Visitor, Session, Proxy Servers and Browser Caches Unique Dimensions in E-Commerce Data Warehouses Session, Page, Event and Referral Fact Tables Complete Session Facts and Page Event Facts E-Commerce Data Warehouse Sizing Lecture 17 (a) and (b) View Materialization Full Materialization, No Materialization and Partial Materialization Which Views to Materialize? How to Use Materialized Views for Optimizing Queries? How to Efficiently Update and Refresh Materialized Views? Efficient Implementation of Materialized View Calculation in MOLAP. Data Cube Chunks and Order of Visiting Data Chunks for Efficient Calculation of Aggregation. Reduction in Memory Requirement for Partial Sum Storage. Week 8 Lecture 18 Materialized view selection Paper by Harinarayan, Rajaram and Ullman Greedy Algorithm Lecture 19 Continued with the greedy algorithm and further examples Indexing OLAP Data Paper by Sarawagi Multidimensional Array indexing sparse dimension issues Bitmap Indexing

Lecture 20 (a) and (b) Multidimensional Indexing R Tree Join Indexing A brief introduction to Business Dimensional Life Cycle for Data Warehousing Projects Week 9 Repeat of First Class Test was held here for students who missed due to campus interviews. Mid-Sem Exam was held here. Week 10 Lecture 21 Introduction to Data Mining KDD and Data Mining SQL and Data Mining Mining Association Rules Why they are Important? Itemsets, Frequent Itemsets, Infrequent Itemsets Downward and Upward Closure Properties Lecture 22 (a) and (b) A priori Algorithm for Association Rule Mining Generation of Frequent Itemsets Extraction of Association rules using Confidence Measures How Association Rules may be used in Data Warehouses Week 11 Lecture 23 Partitioning Algorithm for Association Rule Mining Lecture 24 Dynamic Itemset Counting Algorithm for Association Rule Mining proposed by Brin et al.

Lecture 25 (a) and (b) FP-tree Algorithm for Association Rule Mining proposed by Han et al. Week 12 Lecture 26 Clustering Algorithms What is Clustering? Why we need Clustering? K-means Clustering Lecture 27 K-medoid Clustering PAM algorithm Week 13 Lecture 28 Clustering Algorithm CLARA and CLARANS Lecture 29 Hierarchical Clustering Agglomerative and Divisive Intra-cluster and Inter-cluster distance measures Dendrogram and its representation Agglomerative hierarchical clustering using average inter-cluster distance Lecture 30 (a) and (b) Clustering Algorithm BIRCH Clustering Feature and Clustering Feature Tree Week 14 Lecture 31 Classification - What is Classification?

Training, Testing and Using a Classifier Confusion Matrix Multilayer Perceptron as a Classifier Lecture 32 Back Propagation for MLP Training Other Learning methods for MLP Lecture 33 (a) and (b) Use of Decision Tree for Classification ID3 Algorithm Information Gain Fuzzy Classification Week 15 Lecture 34 Overview of Text and Web Mining Lecture 35 Summary and Feedback Lecture 36 (a) and (b) Term Project Demo