Data Warehouse Technology And The MSD Databases

Size: px
Start display at page:

Download "Data Warehouse Technology And The MSD Databases"

Transcription

1 Data Warehouse Technology And The MSD Databases Philip McNeil

2 Data Warehouses The MSD Databases Populating & using the Search Database

3 Data Warehouses

4 What is a Data Warehouse? A subject-oriented, integrated, nonvolatile, and time-variant collection of data that is used primarily in organizational decision making. (W. H. Inmon, Building the Data Warehouse, John Wiley & Sons, 2002)

5 Data Warehouse Subject-Oriented Organised around major subjects, such as customer, product, sales Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing Provides a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

6 Data Warehouse Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources When data are moved to the warehouse, they are converted into a consistent format to facilitate integration.

7 Data Warehouse Non-Volatile A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data

8 Data Warehouse Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element

9 Dimensional Modelling A typical commercial data warehouse is based on multidimensional data model which views data in the form of a data cube A data cube allows data to be modelled and viewed in multiple dimensions (such as sales) Dimension tables, representing important subject areas such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table containing varying levels of summarised data (such as dollars_sold) and keys to each of the related dimension tables

10 Data Warehouse Models Star schema A fact table, containing varying levels of summarised data, in the middle connected to a set of dimension tables, representing important subject areas Snowflake schema A refinement of star schema where some dimensional hierarchy is normalised into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellation Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

11 Star Schema

12 Star Schema with Sample Data

13 Snowflake Schema Merchandise ItemID Description QuantityOnHand ListPrice Category Transactions SaleID ItemID Quantity SalePrice Amount Sale SaleID SaleDate EmployeeID CustomerID SalesTax Customer CustomerID Phone FirstName LastName Address ZipCode CityID City CityID ZipCode City State Dimension tables can join to other dimension tables.

14 Fact Constellation Shipping Fact Table Time time_key day day_of_the_week month quarter year Branch branch_key branch_name branch_type Sales Fact Table time_key item_key branch_key location_key unit_sold euros_sold avg_sales Item item_key item_name brand type supplier_key Location location_key street city province/street country time_key item_key shipper_key from_location_key to_location_key unit_shipped Shipper shipper_key shipper_name location_key shipper_type

15 Why Build a Data Warehouse? Access to data from multiple sources, have a comprehensive data collection. Separate transactional and analysis systems: Improve query response time (without slowing down transaction processing) Easy formulation of complex queries Access to historical data (not in operational systems) Improved data quality (fewer errors and missing values)

16 The Data Warehouse Pipeline other sources OLAP Server Operational DBs Extract Transform Load Refresh Data Warehouse Serve Analysis Query Reports Data mining Data Storage OLAP Engine Front-End Tools

17 Using the Warehouse Ad Hoc query simplistic submission of SQL statements from a command-line tool or a SQL-generation tool Complex Analytical Questions using custom written query tools or commercial online analytical processing (OLAP) tools Data Mining searching data sets for hitherto unknown correlations

18 The MSD Databases

19 What is a Data Warehouse? (revisited) From the MSD Viewpoint: A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context. -- Barry Devlin, IBM Consultant

20 The MSD databases The MSD actually consists of two separate databases: the deposition database is highly normalised, with thousands of relationships linking some 325 tables; the deposition database is the definitive archive for all structural data at MSD the search database is a simpler, but larger denormalised database,which contains a large amount of additional derived data, with data items duplicated and aggregated into 170 much wider tables, making it more amenable to searching and retrieval of data A third intermediate database is involved in transforming the data from the deposition database to the search database and in calculating and adding the derived data

21 Deposition Database The deposition database comprises: common reference data, such as amino-acid connectivity, HET groups structures, etc. older PDB entries, loaded from legacy files schema includes strict constraints, enforcing internal consistency and performing type checking and validation against the reference data a huge amount of effort has gone into cleaning up the legacy data new entries, loaded from recent PDB submissions new entries are loaded on a weekly basis, subject to the same constraints and checks during loading as legacy data

22 Deposition Database Schema!

23 Part of Deposition Database Schema DEP_PDB_TAXONOMY o PDB_COMMON_NAME o PDB_SCIENTIFIC_NAME o PDB_STRAIN a c ETAXI SYNONYMS # NAME_TXT # NAME_CLASS es_eta_fk NCBI SYNONYMS # NAME_CLASS # NAME_TXT ns_ncb_fk DEP_ENTITY * NAME o ENTITY_SRC * ID o DETAILS o SYSTEM DEP_POLY_ENTITY * ENGINEERED * HETERO_FLAG * MUTANT_FLAG o FRAGMENT_FLAG o MUTATION_STRING * SYNTHETIC is child of * MASTER_ID precedes b DEP_POLY_ENTITY_SEGMENT * RCSB_DEFINE_AS_ENTITY * RCSB_SERIAL_OFFSET follows matched by is composed of d es_eta_fk ETAXI o PARENT_ID o UPPER_NAME o FULL_NAME * COMPLETE_GENOME_FLAG o RANK * HIDDEN * SCIENTIFIC_NAME o LEFT_NUMBER o RIGHT_NUMBER * ANNOTATION_SOURCE eta_ncb_fk taxonomy of eta_ncb_fk ns_ncb_fk NCBI # TAX_ID o PARENT_ID * SCIENTIFIC_NAME o PREFERRED_COMMON_NAME o RANK * HIDDEN has taxonomy ncti_ncb_fk ncti_ncb_fk NTX CHANGED TAX ID # OLD_TAX_ID * NEW_NAME_TXT * OLD_NAME_TXT * USERSTAMP o TIMESTAMP_NCBI is parent of DEP_POLY_ENTITY_MASTER R/1374 CP606 DEP_NATURAL_SOURCE * CELL_ID o ATCC o CELL_LINE o CELL_LOCATION o CELL_TYPE o DETAILS o FRAGMENT o GENE o ORGAN o ORGANELLE o PLASMID o SECRETION o TISSUE o VARIANT JA3 is a component of JR3 DEPOSITION * NUM_XTALS o PDB_EXP_TYPE * DEPOSITION_PROCESSED_BY * CREATION_DATE * LAST_UPDATE * TITLE... DEP_POLY_ENTITY_SEQ * HETERO * SERIAL describes is defined by conflicts with matches entity referrs to referred to by conflict refers to DEP_SEQ_REF * PROC_MATCHED_FLAG o CIF_ID o DETAILS R/1937 DEP_SEQ_MATCH * CIF_ID o DETAILS CP623 R/1939 DEP_SEQ_CORR * CONFLICT_ANNOTATED_FLAG * TYPE o CIF_TYPE o CONFLICT_TYPE o DETAILS a CP621 referred to by b REF_SEQ_REF * DB_NAME * PRIMARY_ID o SECONDARY_ID o VERSION sequence of REF_SEQ_REF_SEQ * COMPONENT * SERIAL compound has sequence is described by DEP_RESIDUE defines REF_CHEM_COMP defines compound o EBI_ID child comp o HETGROUP_PARENT o MODEL_DETAILS parent comp o MOLECULAR_FORMULA o MOLECULAR_WEIGHT o EBI_RESERVED_NAME o RCSB_RESERVED_NAME...

24 NOT The MSD Data Warehouse The MSD Search Database: Is not a true data warehouse Breaks several of the data warehouse cardinal rules Non-volatile Time Variant But does make use of many of the data warehousing concepts and techniques Closest to a fact constellation

25 Search Database Each data item occurs only once in the deposition database, so that data from a single entry are spread across many tables To make searching faster, the data are aggregated into fewer, larger tables in the Search database Searching the Search database requires fewer table joins, making database queries significantly faster and much less complex The top-level entity in a structure entry is the assembly, as determined using the Protein Quaternary Structure server (PQS)

26 Representing Macromolecular Structures (1) Exp. Result Assembly Chains Residues Atoms ENTRY ASSEMBLY CHAIN ASSEMBLY DATA MODEL ATOM DATA ALT ATOM RESIDUE

27 Representing Macromolecular Structures (2) ASU observed Biological Independent exp data Unit(s) units Chains Each level of the hierarchy can have associated properties, e.g. Bound molecules Domains Site residues Derived properties (e.g. asa) Reference information (e.g. standard geometry) Atoms Residues

28 Derived Data During transformation from the deposition database to the search database additional derived data are added Numerous processes are run on the deposition data, including: characterisation of ligand binding sites derivation of secondary structure information mapping data onto other databases such as UniProt, Pfam, InterPro, GO, SCOP and CATH

29 Search Database Data Models Entity-relationship data model as used in the deposition database but denormalised to aid querying could be generated from deposition data model Dimensional model typical of commercial data warehouses requires separate data model A hybrid of these two required to handle to complexity of macromolecular structure data very complex fact tables could be dimensions for other fact tables

30 Part of Search Database Schema strand_data_fk6 STRAND DATA strand_data_fk9 sheet_order_fk1 strand_data_fk1 sheet_order_fk1 SHEET ORDER sheet_order_fk4 sheet_order_fk3 strand_data_fk1 SHEET CATH_MAPPING ETAXI SYNONYMS _strand_data_fk6 _strand_data_fk9 sheet_order_fk2 sheet_order_fk2 HAIRPIN MOTIF etaxi_syn_fk strand_data_fk5 strand_data_fk4 _strand_data_fk7 hairpin_motif_fk5 HELIX DATA helix_data_fk1 hairpin_motif_fk4 hairpin_motif_fk3 _strand_data_fk8 strand_data_fk7 shb_fk1 shb_fk2 strand_data_fk2 helix_data_fk2 strand_data_fk3 strand_data_fk8 bulge_fk3 bulge_fk5 bulge_fk6 bulge_fk7 strand_data_fk2 helix_data_fk2 strand_data_fk3 bulge_fk3 bulge_fk5 bulge_fk6 bulge_fk7 bulge_fk4 COMPONENT DATA bulge_fk4 BULGE hairpin_motif_fk4 hairpin_motif_fk3 hairpin_motif_fk5 DEPOSITION hairpin_motif_fk1 assembly_deposition_fk hairpin_motif_fk1 ASSEMBLY DATA assembly_deposition_fk sheet_order_fk3 CHAIN ASSEMBLY sheet_datak1 sheet_datak2 sheet_datak1 db_chain_fk db_comp_fk ETAXI ncbi_syn_fk etaxi_syn_fk NCBI NCBI SYNONYMS ncbi_syn_fk helix_d_fk2 helix_d_fk1 helix_data_fk1 bulge_fk1 assembly_data_model_fk bulge_fk1 turn_fk1 db_chain_fk chain_tax_fk chain_tax_fk etaxi_ncbi_fk etaxi_ncbi_fk TURN assembly_data_model_fk shb_fk7 strand_data_fk4 ALT helix_d_fk2 helix_d_fk1 turn_fk5 turn_fk3 turn_fk4 turn_fk6 turn_fk5 turn_fk3 turn_fk4 turn_fk6 component_data_fk4 bulge_fk2 bulge_fk2 component_data_fk4 MODEL sheet_order_fk4 sheet_datak2 alt atl_atd_fk COMPONENT db_comp_fk comp_tax_fk comp_tax_fk HELIX HELIX turn_fk2 turn_fk2 strand_data_fk5 ATOM DATA turn_fk1 shb_fk8 atd_model_fk atd_model_fk shb_fk1 shb_fk2 shb_fk8 atom_data SHEET HBOND shb_fk7 atom ATOM MSD SEARCH DATABASE MARTS: SECONDARY STRUCTURE COORDINATES-SEQUENCE TAXONOMY CATH

31 Star Database Design Assembly Dimension Tables PDB Entry Model Fact Table (residue_contact) Interactions Number of interactions Strongest interaction Geometry Bond type Residue Ligand Neighbour Chemical compound Secondary structure Helix Turn Strand

32 From Snowflake to Star Chemical compound PDB entry Assembly Residue Residue Contact

33 Populating & Using The Search Database

34 From Deposition to Search Database Deposition Database Search database Transformation Normalised relationships authoritative complete Denormalised fewer relationships derived information subset of data

35 MSD Database Pipeline Web services Web services Deposition database 40 GB deposit load via mmcif transform Transformation database 200 GB replicate search Search database 200 GB distribute D I S T R I B U T I O N PDB files derived data External Processes 130 GB

36 Transformation Moving from a complex normalised model to enforce integrity to a simple, efficient simple user oriented model Assignment of consistent identifiers In addition to PDB identifiers Extensive indexing Based on a flexible metadata driven mechanism, inhouse developed to overcome Oracle limitations Models composite entities and their dependencies Allows incremental transformation

37 Post Transformation Calculation of derived/aggregated data Consistent across whole archive Ability to query derived data Adding value to the database Active-site information Structure, Sequence Alignment Cross referencing SCOP, CATH, PFAM, UniProt, InterPro Additional derived data and indexing required for web-based search services Stored in search database, but conceptually part of search systems Scientific parameterisation reflects requirements of search services

38 Efficiently Querying The Database Requires using many of the tools provided by the Oracle DBMS STAR Joins STAR Tranformations Bitmap indexes Index Organized Tables Set operations (intersect, minus, union) Optimizer hints (e.g. leading,fact, no_merge)

39 3D Spatial searches Search example: find the following triangle site: Cbeta of Isoleucine or Leucine 6-8 Angstroms 6-8 Angstroms Cbeta of Arginine 6-8 Angstroms Cbeta of Tryptophan or Tyrosine or Phenylalanine

40 The Query select d1.atom_data1_id, d1.atom_data2_id from (select /*+ NO_MERGE INDEX(atomic_dists)*/ atom_data1_id, atom_data2_id from atomic_dists where dist_id in (select id from dists where code_3_letter1 in ('ILE','LEU') and code_3_letter2 in ('TRP','TYR','PHE') and chem_atom1_name = 'CB' and chem_atom2_name = 'CB' and dist in (6,7,8))) d1, (select /*+ NO_MERGE INDEX(atomic_dists) */ atom_data1_id, atom_data2_id from atomic_dists where dist_id in (select id from dists where code_3_letter1 = 'ARG' and code_3_letter2 in ('ILE','LEU') and chem_atom1_name = 'CB' and chem_atom2_name = 'CB' and dist in (6,7,8))) d2, (select /*+ NO_MERGE INDEX(atomic_dists) */ atom_data1_id, atom_data2_id from atomic_dists where dist_id in (select id from dists where code_3_letter1 = 'ARG' and code_3_letter2 in ('TRP','TYR','PHE') and chem_atom1_name = 'CB' and chem_atom2_name = 'CB' and dist in (6,7,8))) d3 where d1.atom_data1_id = d2.atom_data2_id and d1.atom_data2_id = d3.atom_data2_id and d2.atom_data1_id = d3.atom_data1_id;

41 Managing Data The information available in the MSD database is organised in application areas (data marts) Users may replicate only the data marts that they are interested in Some data marts are quite valuable and still small enough to be used on desktop systems as in the demonstration The data marts are also loosely interrelated and can be synchronised independently

42 Data Marts The Search database is divided into application areas, or data marts: Structure Data Descriptions Secondary Structure Taxonomy Ligands Experimental details Citations Mapping to UniProt, SCOP, CATH, Pfam, InterPro, Go, IntEnz, PubMed Active Sites Structural-Sequence alignment Each data mart can be distributed and managed separately

43 What is the Search Database used for? A target for data integration efamily A direct backend for web-based services MSDLite, MSDPro, MSDSite etc. A source for data files Data exported from DB to support web-based services (indirectly a backend) Atlas pages XML representation of sections of the DB Including efamily Coordinates in PDB format for software that requires it Data Mining MSDMine

44 Database Documentation

45 Database Documentation (2)

Data W a Ware r house house and and OLAP Week 5 1

Data W a Ware r house house and and OLAP Week 5 1 Data Warehouse and OLAP Week 5 1 Midterm I Friday, March 4 Scope Homework assignments 1 4 Open book Team Homework Assignment #7 Read pp. 121 139, 146 150 of the text book. Do Examples 3.8, 3.10 and Exercise

More information

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)

More information

Chapter 3, Data Warehouse and OLAP Operations

Chapter 3, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 3, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques. 2001. Morgan Kaufmann.

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques. 2001. Morgan Kaufmann. Data warehousing Han, J. and M. Kamber. Data Mining: Concepts and Techniques. 2001. Morgan Kaufmann. KDD process Application Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Selection Data

More information

Overview of Data Warehousing and OLAP

Overview of Data Warehousing and OLAP Overview of Data Warehousing and OLAP Chapter 28 March 24, 2008 ADBS: DW 1 Chapter Outline What is a data warehouse (DW) Conceptual structure of DW Why separate DW Data modeling for DW Online Analytical

More information

Data Warehousing and Online Analytical Processing

Data Warehousing and Online Analytical Processing Contents 4 Data Warehousing and Online Analytical Processing 3 4.1 Data Warehouse: Basic Concepts.................. 4 4.1.1 What is a Data Warehouse?................. 4 4.1.2 Differences between Operational

More information

Data Warehousing and OLAP Technology

Data Warehousing and OLAP Technology Data Warehousing and OLAP Technology 1. Objectives... 3 2. What is Data Warehouse?... 4 2.1. Definitions... 4 2.2. Data Warehouse Subject-Oriented... 5 2.3. Data Warehouse Integrated... 5 2.4. Data Warehouse

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

Introduction to Data Warehousing. Ms Swapnil Shrivastava [email protected]

Introduction to Data Warehousing. Ms Swapnil Shrivastava swapnil@konark.ncst.ernet.in Introduction to Data Warehousing Ms Swapnil Shrivastava [email protected] Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1 Data Warehouse MIT-652 Data Mining Applications Thimaporn Phetkaew School of Informatics, Walailak University MIT-652: DM 2: Data Warehouse 1 Chapter 2: Data Warehousing and OLAP Technology for Data Mining

More information

Data Mining for Knowledge Management. Data Warehouses

Data Mining for Knowledge Management. Data Warehouses 1 Data Mining for Knowledge Management Data Warehouses Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Niarcas

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

2 Data Warehouse and OLAP Technology for Data Mining 3. 2.1 What is a data warehouse?... 3. 2.2 Amultidimensional data model... 6

2 Data Warehouse and OLAP Technology for Data Mining 3. 2.1 What is a data warehouse?... 3. 2.2 Amultidimensional data model... 6 Contents 2 Data Warehouse and OLAP Technology for Data Mining 3 2.1 What is a data warehouse?... 3 2.2 Amultidimensional data model.... 6 2.2.1 From tables and spreadsheets to data cubes....... 6 2.2.2

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

14. Data Warehousing & Data Mining

14. Data Warehousing & Data Mining 14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,

More information

CHAPTER 3. Data Warehouses and OLAP

CHAPTER 3. Data Warehouses and OLAP CHAPTER 3 Data Warehouses and OLAP 3.1 Data Warehouse 3.2 Differences between Operational Systems and Data Warehouses 3.3 A Multidimensional Data Model 3.4Stars, snowflakes and Fact Constellations: 3.5

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

Lecture 2 Data warehousing

Lecture 2 Data warehousing King Saud University College of Computer & Information Sciences IS 466 Decision Support Systems Lecture 2 Data warehousing Dr. Mourad YKHLEF The slides content is derived and adopted from many references

More information

WWW.VIDYARTHIPLUS.COM

WWW.VIDYARTHIPLUS.COM 4.1 Data Warehousing Components What is Data Warehouse? - Defined in many different ways but mainly it is: o A decision support database that is maintained separately from the organization s operational

More information

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage: http://www.cs.jyu.fi/~mpechen/ties443.

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage: http://www.cs.jyu.fi/~mpechen/ties443. TIES443 Lecture 3 Data Warehousing Mykola Pechenizkiy Course webpage: http://www.cs.jyu.fi/~mpechen/ties443 Department of Mathematical Information Technology University of Jyväskylä November 3, 2006 1

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile

More information

A Critical Review of Data Warehouse

A Critical Review of Data Warehouse Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

More information

Module 1: Introduction to Data Warehousing and OLAP

Module 1: Introduction to Data Warehousing and OLAP Raw Data vs. Business Information Module 1: Introduction to Data Warehousing and OLAP Capturing Raw Data Gathering data recorded in everyday operations Deriving Business Information Deriving meaningful

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

A Design and implementation of a data warehouse for research administration universities

A Design and implementation of a data warehouse for research administration universities A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Data Warehousing & OLAP

Data Warehousing & OLAP Data Warehousing & OLAP Motivation: Business Intelligence Customer information (customer-id, gender, age, homeaddress, occupation, income, family-size, ) Product information (Product-id, category, manufacturer,

More information

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi [email protected]

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com Data Warehousing: Data Models and OLAP operations By Kishore Jaladi [email protected] Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

Data Warehousing and elements of Data Mining

Data Warehousing and elements of Data Mining Data Warehousing and elements of Data Mining prof. e-mail: [email protected] Dipartimento di Matematica e Informatica Università di Udine - Italy Motivation: Necessity is the Mother of Invention

More information

Business Intelligence. 1. Introduction September, 2013.

Business Intelligence. 1. Introduction September, 2013. Business Intelligence 1. Introduction September, 2013. The content of the first lecture Introduction to data warehousing and business intelligence Star join 2 Data hierarchy Strategical data Operational

More information

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business

More information

B.Sc (Computer Science) Database Management Systems UNIT-V

B.Sc (Computer Science) Database Management Systems UNIT-V 1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies Ashish Gahlot, Manoj Yadav Dronacharya college of engineering Farrukhnagar, Gurgaon,Haryana Abstract- Data warehousing, Data Mining,

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

Data Warehousing and OLAP

Data Warehousing and OLAP 1 Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University Bussiness Intelligence and Data Warehouse Schedule Bussiness Intelligence (BI) BI tools Oracle vs. Microsoft Data warehouse History Tools Oracle vs. Others Discussion Business Intelligence (BI) Products

More information

INFO 321, Database Systems, Semester 2 2012

INFO 321, Database Systems, Semester 2 2012 References References INFO 321 Chapter 3: Decision Support Systems Department of Information Science Semester 2, 2012 General Kifer Chapter 17 Silberschatz (5th ed.) Chapter 18 Data Warehousing for Cavemen

More information

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse? Outline Data Warehousing What is a data warehouse? Why a warehouse? Models & operations Implementing a warehouse 2 What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision

More information

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g. Overview Data Warehousing and Decision Support Chapter 25 Why data warehousing and decision support Data warehousing and the so called star schema MOLAP versus ROLAP OLAP, ROLLUP AND CUBE queries Design

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

Week 13: Data Warehousing. Warehousing

Week 13: Data Warehousing. Warehousing 1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]

More information

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What

More information

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise

More information

New Approach of Computing Data Cubes in Data Warehousing

New Approach of Computing Data Cubes in Data Warehousing International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Hybrid Support Systems: a Business Intelligence Approach

Hybrid Support Systems: a Business Intelligence Approach Journal of Applied Business Information Systems, 2(2), 2011 57 Journal of Applied Business Information Systems http://www.jabis.ro Hybrid Support Systems: a Business Intelligence Approach Claudiu Brandas

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2015/16 Unit 2 J. Gamper 1/44 Advanced Data Management Technologies Unit 2 Basic Concepts of BI and Data Warehousing J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT IT0457 Data Warehousing G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline What is data warehousing The benefit of data warehousing Differences between OLTP and data warehousing The architecture

More information

Performance Enhancement Techniques of Data Warehouse

Performance Enhancement Techniques of Data Warehouse Performance Enhancement Techniques of Data Warehouse Mahesh Kokate VJTI-Mumbai, India [email protected] Shrinivas Karwa VJTI, Mumbai- India [email protected] Saurabh Suman VJTI-Mumbai, India

More information

A Survey on Data Warehouse Architecture

A Survey on Data Warehouse Architecture A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS DATA WAREHOUSE CONCEPTS A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational

More information

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring www.ijcsi.org 78 Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring Mohammed Mohammed 1 Mohammed Anad 2 Anwar Mzher 3 Ahmed Hasson 4 2 faculty

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research?

What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research? What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research? Emily Thomas Stony Brook University AIRPO Winter Workshop January 2006 Data to Information Historically

More information

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE Carmen Răduţ 1 Summary: Data quality is an important concept for the economic applications used in the process of analysis. Databases were revolutionized

More information

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS Luca Santillo ([email protected]) Abstract Data Warehouse Systems are a special context for the application of functional software metrics. The use of

More information

Chapter 3 - Data Replication and Materialized Integration

Chapter 3 - Data Replication and Materialized Integration Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional

More information

OLAP Theory-English version

OLAP Theory-English version OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy Agenda The Market Why OLAP (On-Line-Analytic-Processing Introduction

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information