Data Warehouse: Introduction



Similar documents
Data Warehouse: Introduction

Case Study. Sonata develops. comprehensive BI Application for a leading provider of Animal Nutrition Solutions. Ananthakrishnan

Business Intelligence represents a fundamental shift in the purpose, objective and use of information

Business Intelligence and DataWarehouse workshop

Research Report. Abstract: The Emerging Intersection Between Big Data and Security Analytics. November 2012

Design a Distributed Data Warehousing based ROLAP with Materialized Views

Usage of data mining for analyzing customer mindset

Data Warehouse Scope Recommendations

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11

How To Mine Data From A Database

Prototype of a Web ETL Tool

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

SYSTEM MONITORING PLUG-IN FOR MICROSOFT SQL SERVER

UC4 AUTOMATED VIRTUALIZATION Intelligent Service Automation for Physical and Virtual Environments

Allcare Pharmacy Group. Implementation Of Microsoft Dynamics Nav

Data Mining & Advanced Analytics

Team Process Data Warehouse Goals and High-Level Requirements

ORACLE GOLDENGATE 11G

Data Abstraction Best Practices with Cisco Data Virtualization

ERP Areas and Modules / Service

THE MANAGEMENT OF LINUX VIRTUAL LAB BY DUAL LOAD BALANCING AKHIL S NAIK S7-CSE A ROLL NO:2 VJCET

Table of contents Executive Overview... 1 Introduction: The MIOsoft and MIOedge philosophy... 1 The MIOedge platform architecture...

What's New. Sitecore CMS 6.6 & DMS 6.6. A quick guide to the new features in Sitecore 6.6. Sitecore CMS 6.6 & DMS 6.6 What's New Rev:

WinFlex Web Single Sign-On (EbixLife XML Format) Version: 1.5

Traffic monitoring on ProCurve switches with sflow and InMon Traffic Sentinel

OR 2) Implement and customize an off the shelf product that would suit the requirements

Talking Bout. a Revolution 100% 110% 120% 90% 80% 70% 130% 140%

Process Automation With VMware

Purnima Bindal et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2), 2015,

Case Study Best mcommerce marketplace system

The Business of Campaign Response Tracking

Network Security Trends in the Era of Cloud and Mobile Computing

How To Manage Aio Cms

TO: Chief Executive Officers of all National Banks, Department and Division Heads, and all Examining Personnel

Equivio Zoom. The e-discovery platform for predictive coding and analytics

1 GETTING STARTED. 5/7/2008 Chapter 1

Research Report. Abstract: Advanced Malware Detection and Protection Trends. September 2013

White Paper for Mobile Workforce Management and Monitoring Copyright 2014 by Patrol-IT Inc.

Oracle Data Integrator Best Practices for a Data Warehouse

Data Warehuse and Telecmmunicatins Industry

COURSE PROFILE. Business Data Analysis IT431 Fall

Organizational Applications and Solutions SCM and ERP

Knowledge Base Article

SCAN BASED TRADING SBT FOR RETAILERS

Dec Transportation Management System. An Alternative Traffic Solution for the Logistics Professionals

An Oracle White Paper January Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality

Diagnostic Manager Change Log

1) Update the AccuBuild Program to the latest version Version or later.

Project Startup Report Presented to the IT Committee June 26, 2012

Case Study Law Firm Profit and Growth LBMS Transforms a Major Law Firm s Market Expansion & Increased Profitability Vision into Reality

Fund Accounting Class II

Backing Up SAS Content In Your SAS 9 Enterprise Intelligence Platform

HarePoint HelpDesk for SharePoint. For SharePoint Server 2010, SharePoint Foundation User Guide

Research Report. Abstract: Security Management and Operations: Changes on the Horizon. July 2012

URM 11g Implementation Tips, Tricks & Gotchas ALAN MACKENTHUN FISHBOWL SOLUTIONS, INC.

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

SAP Financials: Management Accounting

Integrate Marketing Automation, Lead Management and CRM

Connector for Microsoft Dynamics Installation Guide

Basics of Supply Chain Management

Job Profile Data & Reporting Analyst (Grant Fund)

Credit Report Reissue Recommendation TABLE OF CONTENTS

A96 CALA Policy on the use of Computers in Accredited Laboratories Revision 1.5 August 4, 2015

Supply Chain Management - A Practical Solution Approach

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration

BRISTOL CITY COUNCIL ROLE AND EMPLOYEE PROFILE: Architect (Practitioner Level) Specific Role Data Architect

OFFICIAL JOB SPECIFICATION. Network Services Analyst. Network Services Team Manager

Business Intelligence & Reporting Using BI 360. Charles Allen Managing Consultant BKD Technologies callen@bkd.com

The AppSec How-To: Choosing a SAST Tool

How to Finance your Investment

Transcription:

DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin Data Warehuse: Intrductin Database and data mining grup, Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Decisin supprt systems Data warehuse Intrductin Plitecnic di Trin Huge peratinal databases are available in mst cmpanies these databases may prvide a large wealth f useful infrmatin Decisin supprt systems prvide means fr in depth analysis f a cmpany s business faster and better decisins Cpyright All rights reserved INTRODUCTION - 1 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 2 Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Strategic decisin supprt Demand evlutin analysis and frecast Critical business areas identificatin Budgeting and management transparency reprting, practices against frauds and mney laundering Identificatin and implementatin f winning strategies cst reductin and prfit increase Business Intelligence Database and data mining grup, Plitecnic di Trin BI prvides supprt t strategic decisin supprt in cmpanies Objective: transfrming cmpany data int actinable infrmatin at different detail levels fr analysis applicatins Users may have hetergeneus needs BI requires an apprpriate hardware and sftware infrastructure Cpyright All rights reserved INTRODUCTION - 3 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 4 Plitecnic di Trin Applicatins Database and data mining grup, Plitecnic di Trin Manifacturing cmpanies: rder management, client supprt Distributin: user prfile, stck management Financial services: buyer behavir (credit cards) Insurance: claim analysis, fraud detectin Telecmmunicatin: call analysis, churning, fraud detectin Public service: usage analysis Health: service analysis and evaluatin... and many mre... Lan Amunt Eample Database and data mining grup, Plitecnic di Trin Bank clients with a lan : bad clients wing peridic payments t the bank after due : gd clients respecting peridic payment due Incme Cpyright All rights reserved INTRODUCTION - 5 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 6 Plitecnic di Trin Plitecnic di Trin Pag. 1

DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin Data Warehuse: Intrductin Database and data mining grup, Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Eample Eample Lan Amunt k Incme Lan Amunt Incme If Incme < k then bad client Cpyright All rights reserved INTRODUCTION - 7 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 8 Plitecnic di Trin Data management Database and data mining grup, Plitecnic di Trin Traditinal DBMS usage, characterized by detailed data, relatinal representatin snapsht f current data state well-knwn, structured and repetitive peratins read/write access t few recrds shrt transactins islatin, reliability and integrity (ACID) are critical database size 100MB-GB Data analysis Database and data mining grup, Plitecnic di Trin Data prcessing fr decisin supprt, characterized by histrical data cnslid and integrated data ad hc applicatins read access t millins f recrd cmple queries data cnsistency befre and after peridical lads database size 100GB-TB Cpyright All rights reserved INTRODUCTION - 9 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 10 Plitecnic di Trin Data warehuse Database and data mining grup, Plitecnic di Trin Database devted t decisin supprt, which is kept separate frm cmpany peratinal databases Data which is integrated time dependent, nn vlatile devted t a specific subject used fr decisin supprt in a cmpany W. H. Inmn, Building the data warehuse, 1992 Why separate data? Database and data mining grup, Plitecnic di Trin Perfrmance cmple queries reduce perfrmance f peratinal transactin management different access methds at the physical level Data management missing infrmatin (e.g., histry) data cnslidatin data quality (incnsistency prblems) Cpyright All rights reserved INTRODUCTION - 11 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 12 Plitecnic di Trin Plitecnic di Trin Pag. 2

DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin Data Warehuse: Intrductin (Eternal) data surces Database and data mining grup, Plitecnic di Trin Data warehuse: architecture Metadata DW management Back-end tls Data warehuse Data marts OLAP servers Analysis tls Data Analysis Database and data mining grup, Plitecnic di Trin Data warehuse and data mart Cmpany data warehuse: it cntains all the infrmatin n the cmpany business etensive functinal mdelling prcess design and implementatin require a lng time Data mart: departimental infrmatin subset fcused n a given subject tw architectures dependent, fed by the cmpany data warehuse independent, fed directly by the surces faster implementatin requires careful design, t avid subsequent data mart integratin prblems Cpyright All rights reserved INTRODUCTION - 13 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 14 Plitecnic di Trin Back-end tls Database and data mining grup, Plitecnic di Trin Feed the data warehuse (ETL = Etractin Transfrmatin Lading) data etractin frm data surces data cleaning (errrs, missing r duplicated data) frmat trasfrmatins and cnversins data lading and peridical refresh Database and data mining grup, Plitecnic di Trin Multidimensinal representatin Data are represented as an (hyper)cube with three r mre dimensins Measures n which analysis is perfrmed: cells at dimensin intersectin Data warehuse fr tracking sales in a supermarket chain: dimensins: prduct, shp, time measures: sld quantity, sld amunt,... Cpyright All rights reserved INTRODUCTION - 15 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 16 Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Multidimensinal representatin Data analysis tls Database and data mining grup, Plitecnic di Trin 3 shp SupShp prduct OLAP analysis: cmple aggregate functin cmputatin supprt t different types f aggregate functins (e.g., mving average, tp ten) Data analysis by means f data mining techniques varius analysis types significant algrithmic cntributin Cpyright All rights reserved 2-3-2000 INTRODUCTION - 17 MilkTTT Frm Glfarelli, Rizzi, Data warehuse, teria e pratica della prgettazine, McGraw Hill 2006 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 18 Plitecnic di Trin Plitecnic di Trin Pag. 3

DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin Data Warehuse: Intrductin Data analysis tls Database and data mining grup, Plitecnic di Trin Presentatin separate activity: data returned by a query may be rendered by means f different presentatin tls Mtivatin search Data eplratin by means f prgressive, incremental refinements (e.g., drill dwn) Slicing and dicing Aggregatin OLAP analysis prduct shp shp Database and data mining grup, Plitecnic di Trin city= Turin ' SupShp shp prduct categry= fd prducts' year=2000 city prduct categry Cpyright All rights reserved INTRODUCTION - 19 Plitecnic di Trin year Frm Glfarelli, Rizzi, Data warehuse, teria e pratica della prgettazine, McGraw Hill 2006 Cpyright All rights reserved INTRODUCTION - 20 Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Types f data mining activities Classificatin and regressin: predictive mdel generatin requires a previusly labeled data set Assciatin rules: etractin f data crrelatins Clustering: data partined in hmgeneus grups requires the ntin f distance between tw elements high Eample: classificatin Age < 26 Car type = sprt Database and data mining grup, Plitecnic di Trin Age Car type Risk categry 40 SW lw 65 sprt high 20 utility high 25 sprt high 50 utility lw high lw Decisin tree Cpyright All rights reserved INTRODUCTION - 21 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 22 Plitecnic di Trin Database and data mining grup, Plitecnic di Trin Eample: assciatin rules Given a cllectin f cunter transactins in a supermarket (receipts) Assciatin rules diapers beer 2% f transactins cntains bth elements 30% f transactins cntaining diapers als cntains beer Database and data mining grup, Plitecnic di Trin Servers fr Data Warehuses ROLAP (Relatinal OLAP) server etended relatinal DBMS cmpact representatin fr sparse data SQL etensins fr aggregate cmputatin specialized access methds which implement efficient OLAP data access MOLAP (Multidimensinal OLAP) server data represented in prprietary (multidimensinal) matri frmat sparse data require cmpressin special OLAP primitives HOLAP (Hybrid OLAP) server Cpyright All rights reserved INTRODUCTION - 23 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 24 Plitecnic di Trin Plitecnic di Trin Pag. 4

DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin DataBase and Data Mining Grup f Plitecnic di Trin Data Warehuse: Intrductin Database and data mining grup, Plitecnic di Trin Relatinal representatin: star mdel (Numerical) measures stred in the fact table attribute dmain is numeric Dimensins describe the cntet f each measure in the fact table characterized by many descriptive attributes Eample: Data warehuse fr tracking sales in a supermarket chain Shp Sale Prduct Data warehuse size Database and data mining grup, Plitecnic di Trin Time dimensin: 2 years 365 days Shp dimensin: 300 shps Prduct dimensin: 30.000 prducts, f which 3.000 sld every day in every shp Number f rws in the fact table: 730 300 3000 = 657 millins Size f the fact table 21GB Cpyright All rights reserved INTRODUCTION - 25 Date Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 26 Plitecnic di Trin Meta data Database and data mining grup, Plitecnic di Trin Different types f meta data: fr data transfrmatin and lading: describe data surces and needed transfrmatin peratins fr data management: describe the structure f the data in the data warehuse (als fr materialized view) fr query management: data n query structure and eecutin SQL cde fr the query eecutin plan memry and CPU usage Tetbks Database and data mining grup, Plitecnic di Trin Data warehusing Glfarelli, Rizzi, Data warehuse: teria e pratica della prgettazine, McGraw-Hill 2006 Kimbal et al., tetbks n metdlgy and case studies, Wiley Data mining Han, Kamber, Data mining: cncepts and techniques, Mrgan Kaufmann 2006 Tan, Steinbach, Kumar, Intrductin t data mining, Pearsn 2006 Cpyright All rights reserved INTRODUCTION - 27 Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 28 Plitecnic di Trin Useful links Data warehuse http://www.dwinfcenter.rg http://www.dwreview.cm http://kimballuniversity.cm Data mining http://www.kdnuggets.cm/ Database and data mining grup, Plitecnic di Trin Cpyright All rights reserved INTRODUCTION - 29 Plitecnic di Trin Plitecnic di Trin Pag. 5