Introduction to Data Warehousing and OLAP

Size: px
Start display at page:

Download "Introduction to Data Warehousing and OLAP"

Transcription

1 Introduction to Data Warehousing and OLAP Outline Part I Introduction OLAP vs OLTP Data Cleaning and Integration Part II Data Models and Warehouse Design Part III Index Structures for Data Warehouses 1

2 Types of Data Operational Data (OLTP applications) Data that works Frequent updates and queries Normalized for efficient search and updates (minimize update anomalies) Fragmented and local relevance Point Queries: queries accessing individual tuples Types of Data Historical Data (OLAP applications) Data that tells Very infrequent updates Integrated data set with global relevance Analytical queries that require huge amounts of aggregation Performance issues mainly in query response time (not in updates) 2

3 Example OLTP Queries What is the salary of Mr. Ali? What is the address and phone number of the person in charge of the Supplies department? How many employees have received an excellent credential in the latest appraisal? Example OLAP Queries How is the employee attrition scene changing over the years across the company? Is there a correlation between the geographical location of a company unit and excellent employee appraisals? Is it financially viable to continue our manufacturing unit in Taiwan? 3

4 A Data Warehouse An infrastructure to manage historical data Designed to support OLAP queries involving gratuitous use of aggregations Post retrieval processing (reporting) just as complex, if not more, as the retrieval itself Warehousing Data OLTP Unit Operational Data OLTP Unit OLTP Unit Data Cleaning and Integration Data Warehouse 4

5 Data Marts Data warehouses seen as a collection of data-marts or historical data about each OLTP segment that feeds into the warehouse Data-marts also seen as small warehouses for OLAP activities within a given segment Data Cleaning and Integration Back flush DCU DIU Data Warehouse OLTP Databases Updates / Feedback 5

6 Dirty Data Lack of Standardization Multiple encodings, locales, languages Spurious abbreviations: Allama Iqbal Road and A.I. Road are the same Semantic equivalence: Rawalpindi is the same as Pindi Multiple standards: 1.6 kilometer is the same as 1 miles Dirty Data Missing, spurious and duplicate data Missing age field for an employee Spurious (incorrectly entered) sales values Duplication of data-sets across OLTP units Semantic duplication (M. A. Khan appearing in another data set as Khan Muhammad Ali) 6

7 Dirty Data Inconsistencies Incorrect use of codes (use of M/F in addition to 0/1 for gender) Codes with inconsistent or outdated meaning (Travel eligibility C denoting eligibility to travel only by III class sleeper, which no longer exists) Inconsistent duplicate data (two data sets are found to belong to the same person, but have two different address information) Dirty Data Inconsistencies Inconsistent associations (Sales figures provided by the marketing department do not add up to the total sales figures by the retail units) Semantic inconsistencies (Feb 31 st ) Referential inconsistency (Rs. 10 lakhs sales reported from a unit that has been closed down) 7

8 Issues in Data Cleaning Cannot be fully automated GIGO (Garbage in Garbage out) Requires considerable knowledge that is tacit and beyond the purview of the warehouse (metrics, geography, govt. policies, etc.) Complexity increases (usually geometrically) with increase in data sources Complexity increases with the history span that is taken up for cleaning Steps in Data Cleaning (Rahm and Do [1]) 1. Data Analysis: Analyze data set to obtain meta-data and detect dirty data 2. Definition of transformation rules: Transform data from its current dirty form to the required clean form. Transformation can be either at the schema level or data level 3. Rule Verification: Verification of the transformation rules on test data sets 4. Transformation: Execution of transformation rules on data set 5. Backflow: Re-populating data sources with cleaned data. 8

9 Data Analysis Techniques (Refs [1],[2]) Problem to be Detected Illegal values Spelling mistakes Lack of standards Duplicate and missing values (max, min), (mean, deviation), Cardinality Hashing, N-gram outliers Meta-data used Column comparison (compare value sets from given column across tables) Compare cardinality with #rows, detect nulls, use rules to predict incorrect or missing values. Transformation Algorithms Hash-Merge for duplicate elimination 1. Hash tuples based on given column into buckets 2. (Tuples with duplicate values are hashed onto the same bucket) 3. Merge tuples within each bucket separately 9

10 Transformation Algorithms Name M.A. Khan Saleem Address 50, Lvl Rd. 25, LB Rd Dept Sales R&D M.A. Khan Rahim 50, Lvl Rd. 30, Snky Rd. PR Products Hash Buckets Hash key Transformation Algorithms Sorted Neighborhood Technique for misspelling integration 1. Identify a set of data values within a given row as key 2. Sort table based on key 3. Slide a window of n rows over the sorted table and merge data values based on rules. (Ex: Merge names if all other values like age, address, dept, etc. match) 4. Make multiple passes until there are no more merges of records 10

11 Transformation Algorithms Name M.A. Khan Saleem M.A. Khan Rahim Address 50, Lvl Rd. 25, LB 50, Rd. Lvl Rd. 30, Snky Rd. Dept Sales R&D PR Products Rule: Merge rows if name and address match. Window size n = 3. Name M.A. Khan M.A. Khan Rahim Saleem Address 50, Lvl Rd. 50, Lvl Rd. 30, Snky Rd. 25, LB Rd. Dept Sales PR Products R&D Transformation Algorithms (Monge Elkan 97, [3]) Graph-Based transitive closure to reduce number of passes 1. Use sorted neighborhood technique and sort records based on identified keys 2. Create an undirected graph structure where nodes correspond to records and edges correspond to is a duplicate of relationship 3. Records R1 and R2 need not be compared in any pass if they belong to the same connected component 11

12 R3 R1 Transformation algorithms R2 R1 R2 R1 R2 R4 R3 R4 R3 R4 R5 R5 R5 Slide 0: R1, R2, R3 Slide 1: R2, R3, R4 Slide 2: R3, R4, R5 Slide 0: R1, R2, R3 Slide 1: R3, R4, R5 Naïve sliding window Graph-based transitive closure No need to compare R1/R2 with R4/R5 Integration Combining disparate data sources into a single schematic structure Schema Integration: Forming an integrated schematic structure from the disparate data sources Data Integration: Cleaning and merging data from different sources 12

13 Schema Integration Consider the following schemata [4]: Cars (serialno, model, colour, stereo, glasstint, ) and Autos(serienNr, modelle, farbe) Optionen(serienNr, stereo, glastint, ) Schema Integration Challenges Naming differences Structural differences Data type differences Missing fields Semantic differences 13

14 Schema Integration Generic Architecture of an Integrator Mediator / Constructor Wrapper / Extractor Wrapper / Extractor Wrapper / Extractor Data Sources Integration Wrapper / Extractor Creates a common view across all data sources Bridges differences in naming, type and schema structure Wrappers do not physically extract data from the data sources Mediator / Constructor Constructs an integrated schematic structure Performs data integration and populates the data warehouse 14

15 Tools for Data Cleaning and Integration dfpower From Dataflux corporation ( De-duplication engine Analyzes data based on values and number of occurrences Does not support detection of semantic duplicates based on user specified rules Permits duplicates to be grouped or merged Tools for Data Cleaning and Integration ETI* Data Cleanse From Evolutionary Technologies Int l ( Table driven data cleaning, matching and quality review, duplicate matching, imprecise spelling correction Supports meta-data repositories to store schemas, transformation rules, interrelationships, etc. 15

16 Tools for Data Cleaning and Integration SSA Name/Data Clustering Engine From Search Software America ( Addresses errors in spelling, typing, transcription, nicknames, synonyms, abbreviations, prefix/suffix variations, punctuation, casing, etc. Supports user specified transformation rules Scalable up to 500 M records Summary OLAP versus OLTP Characteristics of OLAP queries Data Warehousing systems Data Cleaning Issues Dirty Data Cleaning Algorithms Integration of Data and Schema 16

17 Introduction to Data Warehousing and OLAP Part II: Data Models and Warehouse Design Example OLAP Queries How is the employee attrition scene changing over the years across the company? Is there a correlation between the geographical location of a company unit and excellent employee appraisals? Is it financially viable to continue our manufacturing unit in Taiwan? 17

18 OLAP Query Characteristics Aggregation and summarization over large data sets Clustering Trend detection Multi-dimensional projections A Typical Warehouse Hypercube Core Materialized Views 18

19 A Typical Warehouse Hypercube Core Manages the atomic data elements Global schematic structure for the entire warehouse Based on the multi-dimensional data model Materialized Views Physical views for faster aggregate-query answering De-normalization of the core The Sales Hyper Cube Product Week Branch 19

20 The Sales Hyper Cube Sales is the fact Branch, Product and Week are dimensions Operations on Hyper Cubes Pivoting: Choosing (rotating the cube on a pivot) a set of dimensions for display Slicing-dicing: Select some subset of the cube Roll-up: Aggregate a dimension to a smaller dimension (Roll-up weeks dimensions into months) Drill-down: Open an aggregated dimension to reveal details (Open up months to reveal week-by-week information) 20

21 Implementation of Hyper Cubes Multi-dimensional to relational mapping (ROLAP) Map hyper cube queries to relational queries and maintain the data cube in a set of RDBMS tables Ex: True Relational OLAP from Microstrategy Inc ( Native multidimensional model (MOLAP) Use a separate storage model for multidimensional data Ex: Arbor Essbase ( Physical models: Star Branch Dimension Table Product Dimension Table Brnch Prod Wk Sales Fact Table Week Dimension Table 21

22 Star Schema Features Central Fact Table Set of supporting dimension tables Denormalized data storage Advantages Simple to comprehend and design Small meta-data Quick query responses Limitations Not robust towards changes (Changes in dimension table) Enormous amount of redundancy in dimension-table data Physical models: Snowflake Branch Product Division Brnch Prod Wk Sales Options Scheme Unit Week 22

23 Snowflake Schema Features Central Fact Table Normalized dimension tables storing atomic data units Advantages Faster query responses Easy updation Limitations Large amount of meta-data May result in too many tables Harder to comprehend manually Physical models: Constellation Branch Dimension Table Brnch Prod Wk Sales Product Dimension Table Discounts Fact Table Wk Prod Sch Dist Sales Fact Table Week Dimension Table Scheme Dimension Table 23

24 Constellation Most commonly used architecture Used when multiple fact tables are needed Usually has a main fact table and several auxiliary fact tables which are summary tables or materialized views over the main fact table Helps in faster query answering for frequently asked queries Costlier to update than snowflake Issues in Data Cubes Curse of high dimensionality Currently known index structures degrade to linear search when number of dimensions become high Categorical dimensions In order to run certain algorithms like clustering, dimensions should belong to ordinal classes Categorical dimensions difficult to index Ordinal changes during aggregation Certain dimensions may change their ordinal property when aggregated and should be indexed at several levels. Ex: Student names are ordered lexicographically, but when aggregated into classes, are ordered on their graduation year. 24

25 The Time Dimension Mandatory in most warehouse applications Has several meanings and roll-up techniques depending on application context Simple calendar based rollup Fiscal calendar based rollup Academic calendar based rollup Need to separately index special dates like releases, events, etc. Order of traversal of time dimension important Materialized Views Summary tables that create physical views of fact table Trade-off between faster query answering and increased complexity during updates When to materialize? Use the result to search space (RSS) ratio: (#rows returned / # rows scanned) for query Summarize if RSS ratio too small and query is too frequent 25

26 Revision History Table(s) Manages data that is revised over time Queries select appropriate value based on relevant version Usually required for most warehousing applications Id 1 (turnover per employee) 1 Val 110, ,045 Revised , Designing a Data Warehouse Enterprise Model DW Logical Model DW Physical Model End-user + DBA DBA + Automated Tools 26

27 Enterprise to Warehouse Some thumb rules: Warehouse logical model closely resembles enterprise model Some transformation usually necessary from enterprise to warehouse models Warehouse logical model should depict denormalized data sets implicit in enterprise model Special planning required for managing time dimension and revision histories OLTP to Warehouse Models OLTP databases usually organized around the enterprise model OLTP schemas provide a good starting point for designing OLAP logical models 27

28 OLTP to Warehouse Models Some thumb rules when converting OLTP schemas into OLAP schemas: Look for operational data fields and remove them (Ex: Counter-sales table containing register number, cashier Emp_id) Add time element (and version elements if necessary) to data sets before populating the warehouse Decide on derived data and summary tables at design time itself Iterate between transformation rule specification, integration and schema design Add the commonly required summary information ALL to every domain Introduction to Data Warehousing and OLAP Part III: Index structures and query processing 28

29 Classes of Dimensions Categorical {Cats, dogs, sheep, cows, bulls, buffaloes} Ordinal Totally ordered (integers), partially ordered (credentials of a candidate) Sparse Small number of data points per value Dense Large number of data points per value Multi-dimensional indexes Usually based around ordinal classes Different kinds of indexes for sparse and dense data sets Performance may depend on storage structure for data set 29

30 Representing Multi-dimensional Data Multi-level sorting Sorts data based on different dimensions one after the other Simple to implement Searching is fast if dominant attribute is part of query Search becomes fragmented if dominant attributed omitted from query Dim 1 Dim 2 Dim Representing Multi-dimensional Data Space filling curves Sorted on all attributes at once Location of a data point easily computable Suffers with increase in number of dimensions

31 Multi-dimensional Indexes Ordered index on multiple attributes Considers a composite key as a tuple of simple keys (k 1, k 2, k n ) Ordered index files maintained by ordering each key in sequence. Multi-dimensional Indexes Partitioned Hashing Given a composite key (k 1, k 2, k n ), partitioned hashing returns n different bucket numbers Hash bucket is determined by concatenating the n numbers. 31

32 Multi-dimensional Indexes Grid Files Partitions the range of key values for each key into several buckets Combinations of buckets of each key forms a grid A grid file stores a grid as any other multidimensional data set. Grid Files Grade A B C D Roll No Roll No Bucket Pool 32

33 Multi-dimensional Indexes Bit-map indexes Used on fields that are sparse (i.e. has only a small number of values. Example, gender, grade, etc.) A bit vector enumerates all possible values and sets corresponding bit for each data element Much more compact than other index structures Useful for efficiently answering composite queries over multiple bit-vectored fields Can be integrated with tree indexes Multi-dimensional Indexes Encoding Bit-map indexes Grade = {A, B, C, D, E, F} Subject = {DB, AI, PDS} A = DB = 001 B = AI = 010 C = PDS = 100 D = no value = 000 E = F = Student who has scored A in DB and AI No value = ( && 001 && 001) 33

34 Multidimensional Tree Indexes KD Trees A binary tree structure that can store n- dimensional data points Each dimension compared at appropriate level Useful for point queries KD Trees Let data be represented as 2-dimensional points of the form (x,y) representing (salary, age) Example data set: (2500, 20) (5000, 32) (4500, 28) (2000, 23) (4800, 25) (1800, 18) (6500, 27) 34

35 KD Trees (2500, 20) (5000, 32) (4500, 28) (2000, 23) (4800, 25) (1800, 18) (6500, 27) 1800, , , , , , , 27 KD Trees 35

36 KD Trees Each point divides search space along one of the dimensions Structure of the tree (and hence its performance) sensitive to the order of insertion of data points Quad Trees Initially, index contains only one bucket representing the entire space If number of data points in any bucket exceeds maximum limit, it is split into two along each dimension and are added as children of the larger bucket When number of dimensions = 2, splitting results in a quad 36

37 Quad Trees R Trees Manages regions Leaf nodes represent data regions and non-leaf nodes represent virtual (non-data) regions A node is split when it contains too many regions Addition of regions begins from root node until the smallest accommodating region is found (possibly by splitting one or more regions) Sibling regions may overlap but may not subsume one another 37

38 R Trees Data region Virtual region R Trees Suitable for range, neighborhood and nearness searches Tree structure and performance sensitive to order in which data regions are added Suffers from the curse of high dimensionality 38

39 Indexing Categorical Data (Ref [7]) Categorical Data Have no ordinal relationship Cannot be compared, except for equality Can be represented as sets in many cases Example categorical attribute: Team members of a given project, Ingredients for a given recipe, Products manufactured by a unit, etc. Comparison operators on sets: equality, membership, superset, subset Signatures Represent a set as a bitmap where each bit corresponds to an object in a larger UoD UoD = {set of all ingredients} S, T UoD : ingredients for two recipies s, t : corresponding bit maps of S and T Queries: S T s ~t = 0 S T t ~s = 0 39

40 Signature Trees Leaf nodes contain (signature, datapointer) pairs Non-leaf nodes formed by bit-wise ORing of its children nodes Traverse the tree by AND ing the query signature with the node signature Extensible Signature Hashing Hash tables constructed based on the most significant d bits of signature Hash levels extended by extending d whenever overflow occurs 40

41 Extensible Signature Hashing d = 2 Bucket for records whose hash values starts with Global depth n = 3 d = 3 d = 3 d = 1 Bucket for records whose hash values starts with 010 Bucket for records whose hash values starts with 011 Bucket for records whose hash values starts with 1 Summary The OLAP Hypercube Materialized views ROLAP and MOLAP implementations Star, Snowflake and Constellation Time dimensions and revision tables Thumb rules for OLAP design Multi-dimensional index structures 41

42 Furthermore Topics not addressed for reasons of brevity Query Language constructs Data Mining over warehouses Handling semi-structured data in warehouses Performance Tuning Maintenance of materialized views Browsing and Visualization Thank You 42

43 References 1. Erhard Rahm, Hong Hai Do. Data Cleaning: Problems and Current Approaches. Bulletin on the Technical Committee on Data Engineering, IEEE Computer Society, Vol. 23, No. 4, Dec Vijay T. Raisinghani. Cleaning Methods in Data Warehousing. PhD seminar report, IIT Bombay, Dec A. Monge, C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In proceedings of the SIGMOD 1997 workshop on data mining and knowledge discovery, May References 4. H. Garcia-Molina, J.D. Ullman, J. Widom. Database Systems: The Complete Book. Pearson Education, R. Agrawal, A. Gupta, and S. Sarawagi, Modeling multidimensional databases, ICDE, Oliver Guenther. Data Warehouses and Data Mining. Course Notes, Humboldt University, Berlin Helmer, S., Moerkotte, G., A study of four index structures for set-valued attributes of low cardinality. Reihe Informatik 2, University of Mannheim. pp

44 Conferences and Workshops DaWaK: Data Warehousing and Knowledge Discovery ( VLDB: Very Large Databases ( EDBT: Extending Database Technology ( DOLAP: ACM International Workshop on Data Warehousing and OLAP ( Some WWW Links DW Infocenter ( Data.com ( The Data Warehousing Institute ( KDNuggets, a comprehensive portal on knowledge discovery ( Oracle Data Warehousing Tutorial (regn. Required) ( ow_to04.html) Data Warehouse: Online Recourses ( htm) Data Warehousing and OLAP bibliography ( 44

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 31 Introduction to Data Warehousing and OLAP Part 2 Hello and

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

A Critical Review of Data Warehouse

A Critical Review of Data Warehouse Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems

A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

1960s 1970s 1980s 1990s. Slow access to

1960s 1970s 1980s 1990s. Slow access to Principles of Knowledge Discovery in Fall 2002 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP

More information

Week 13: Data Warehousing. Warehousing

Week 13: Data Warehousing. Warehousing 1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Data Warehousing and OLAP

Data Warehousing and OLAP 1 Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots

More information

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com Data Warehousing: Data Models and OLAP operations By Kishore Jaladi kishorejaladi@yahoo.com Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

14. Data Warehousing & Data Mining

14. Data Warehousing & Data Mining 14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

DATA WAREHOUSE E KNOWLEDGE DISCOVERY DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course

More information

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse? Outline Data Warehousing What is a data warehouse? Why a warehouse? Models & operations Implementing a warehouse 2 What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Optimizing Your Data Warehouse Design for Superior Performance

Optimizing Your Data Warehouse Design for Superior Performance Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex

More information

Optimization of ETL Work Flow in Data Warehouse

Optimization of ETL Work Flow in Data Warehouse Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Data Mining as Part of Knowledge Discovery in Databases (KDD) Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices

More information

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Decision Support Chapter 23 Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer Data Warehousing Overview, Terminology, and Research Issues 1 Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Introduction to Data Warehousing Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Data Warehousing Overview

Data Warehousing Overview Data Warehousing Overview This Presentation will leave you with a good understanding of Data Warehousing technologies, from basic relational through ROLAP to MOLAP and Hybrid Analysis. However it is necessary

More information

Data Warehousing, OLAP, and Data Mining

Data Warehousing, OLAP, and Data Mining Data Warehousing, OLAP, and Marek Rychly mrychly@strathmore.edu Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

CS2032 Data warehousing and Data Mining Unit II Page 1

CS2032 Data warehousing and Data Mining Unit II Page 1 UNIT II BUSINESS ANALYSIS Reporting Query tools and Applications The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools

More information

The Cubetree Storage Organization

The Cubetree Storage Organization The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

IBM WebSphere DataStage Online training from Yes-M Systems

IBM WebSphere DataStage Online training from Yes-M Systems Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training

More information

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

More information

Overview of Data Warehousing and OLAP

Overview of Data Warehousing and OLAP Overview of Data Warehousing and OLAP Chapter 28 March 24, 2008 ADBS: DW 1 Chapter Outline What is a data warehouse (DW) Conceptual structure of DW Why separate DW Data modeling for DW Online Analytical

More information

Turkish Journal of Engineering, Science and Technology

Turkish Journal of Engineering, Science and Technology Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

II. OLAP(ONLINE ANALYTICAL PROCESSING)

II. OLAP(ONLINE ANALYTICAL PROCESSING) Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **

More information

Business Intelligence: Using Data for More Than Analytics

Business Intelligence: Using Data for More Than Analytics Business Intelligence: Using Data for More Than Analytics Session 672 Session Overview Business Intelligence: Using Data for More Than Analytics What is Business Intelligence? Business Intelligence Solution

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g. Overview Data Warehousing and Decision Support Chapter 25 Why data warehousing and decision support Data warehousing and the so called star schema MOLAP versus ROLAP OLAP, ROLLUP AND CUBE queries Design

More information

Data W a Ware r house house and and OLAP II Week 6 1

Data W a Ware r house house and and OLAP II Week 6 1 Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course Module 1: Introduction to Data Warehousing and OLAP Introducing Data Warehousing Defining OLAP Solutions Understanding Data Warehouse Design Understanding OLAP Models Applying OLAP Cubes At the end of

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Extraction Transformation Loading ETL Get data out of sources and load into the DW Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the

More information

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations www.ijcsi.org 202 Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations Muhammad Saqib 1, Muhammad Arshad 2, Mumtaz Ali 3, Nafees Ur Rehman 4, Zahid

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS DATA WAREHOUSE CONCEPTS A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

Data warehousing with PostgreSQL

Data warehousing with PostgreSQL Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining 1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify

More information