European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project"

Transcription

1 European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute Technology DLM Forum, 13 November 2014

2 THE E-ARK PROJECT IS CO-FUNDED BY THE EUROPEAN COMMISSION UNDER THE ICT-PSP PROGRAMME

3 Outline E-ARK objectives Current practices and needs Transactional (OLTP) vs Analytical (OLAP / Data Warehousing) techniques Database archiving in E-ARK

4 E-ARK Facts and Figures EU CIP PCP ICT Programme Objective 2.5: earchiving services Pilot B The pilot should share information on integration, operation and interoperability issues throughout the EU in order to facilitate the creation and maintenance of a European archiving infrastructure for government and public services thus promoting the re-use of archival data. 36 months: February 2014 January M Budget, 3 M funded by EC 16 Partners for all deliverables for all software 4

5 E-ARK Objectives Reduce the cost of transfer, preservation and access to digital information by Standardising how agencies export and send information to digital archives Providing open formats for the long-term preservation of various content Exploring needs for accessing archives and providing novel interfaces for these Long-term vision: Improve semantic and technical interoperability to a level which allows any system developer to deliver out-of-the-box archiving functionality

6 Pre-Ingest E-ARK SIP SIP Creation Tools Archival records Content and Records Management Systems Ingest and Preservation SIP AIP Conversion Digital preservation systems Scalable Computation E-ARK AIP AIP - DIP Conversion CMIS Interface Data Mining Interface E-ARK DIP Access Archival Search, Access and Display Tools Content and Records Management Systems Data Mining Showcase

7 Current database archiving practice Snapshot policy Ingest: Transform the original relational structure into open formats Formats: SIARD, ADDML, DBML Access: Users need to find the appropriate snapshot(s), load these into a current DBMS and use predefined queries or build their own ones DB Snapshot Table 1 PK Row 1.1 Row 1.2 Row 1.3 Table 2 PK Row 2.1 Row 2.2 Row 2.3 Table 3 PK Row 3.1 Row 3.2 Row 3.3 Codes Code 1 Code 2 Code 3 Ingest Access

8 Problems Finding the appropriate snapshot Most users search for data about something Which car had the plate number 111YYY in January 15th 2000 Current practice allows to search for the database snapshot which includes data about something Which database includes information about cars in January 15th 2000 Scope of the snapshot The scope of data and time period covered in a single snapshot usually do not meet the needs of the user Required technical knowledge Relational structures are often highly optimised and hard to grasp Most users do not have the knowledge to build accurate queries for specific access needs The only way is to use pre-defined queries which have been archived along with the data

9 In the ideal world users do not need to search for databases but data! Semantic reuse Topic based reuse Big data analysis

10 HOW TO DO IT?

11 Transactional Processing (OLTP)

12 Online Analytical Processing (OLAP) OLAP

13 Data warehousing Updates only from a DB Snapshots (Useful for DB archiving) Can be denormalised Star schema dimensional model

14 Star Schema

15 Database archiving in E-ARK Overview of E-Ark Concepts Archiving of databases in different layers: primary format, semantic representation, representation for analytical processing. Data intensive technology for AIP storage and processing. Hadoop, HDFS, HBase, Lily, SolR. Support for database transformation and analysis such as denormalization, aggregation, indexing. Levels of DIP format and display Access to archived records, based on OLAP queries and reports, as dynamically reconstructed RDBs.

16 Extract Transform Load Goal: Integrate data from multiple applications into a database / warehouse. Extracting data from source systems like RDBMS and flat files. Transform: derive, extract, aggregate data. Load data into target: Overwrite cumulative information or add new data Important pre-processing step for data mining/analytics. involves data cleaning and data integration Result: Structured data, random access based on data based indexes (e.g. RDBMS, NOSQL). E-Ark Approach: Automated transformation of archived databases into snowflake schema representation(s). Denormalized, connected fact and dimension tables

17 Indexing - Full Text Search Searching documents based on full text distinguished from searches based on metadata Returns (ranked) list of document IDs Involved Information Retrieval methods Building an inverted index Scoring and weighting Results Text classification Evaluation Approach in E-Ark Denormalization / star schema transformation and ingestion into Apache HBase. Repository and faceted search on records based on NGDATA s Lily repository and Apache SolR.

18 Online analytical processing (OLAP) OLAP Database / Data Warehouse Aggregated, historical data, low transaction rate Resource-intensive and complex queries Analyse multi-dimensional data in a read efficient manner (Web analytics, sales) View metrics by combination of dimensions Time vs. Space: Pre-aggregates data to build cube Dynamically analyze data from multiple perspectives roll-up, drill-down, and slicing and dicing Approach in E-Ark Data analytics based on pre-processed database representation arranged along dimensions. Data loaded and queried through Apache HBase. Access (DIPs) supported by additional use of OLAP tools.

19 Data Mining Identify correlations and patterns in existing data Used by statisticians, database and business communities Data Analysis Techniques Regression: predict continuous valued output (e.g. price) Classification: discrete valued output (e.g. char. recognition) Segmentation: Separates data into interesting groups Based on mathematical methods pattern matching, machine learning, numerical analysis E-Ark data mining mainly based on text mining. Using data structure of the repository or search index Scalable through MapReduce / Apache Mahout. Goal: Clustering, Labeling, Anomaly Detection

20 Proposed Architecture EARK-AIP Data Management Application ESS Arch Preservation Platform Data Mining Showcase T6.4 D6.3 Data Connector API CRUD API Query API Data Mining API AIP Storage T6.2 MS10 Data Management Integration T6.1 MS06, D6.2 Query and Indexing T6.3 MS04, D6.1 Scalable Computation Staging Area Lily, Hadoop, HBase, HDFS Re-use and Data Mining T6.4 Archive Storage (WORM)

21 Tasks and Components Archival Storage Store APIs on HDFS using ESS Preservation Platform Bulk-load, permanent and replicated storage Data Integration Extract data from archival information package. ETL data into Lily/HBase, keep AIP in HDFS (don t touch) Query and Indexing Metadata on AIP level stored in HBase for basic retrieval Faceted search based on Apache SolR Data Mining and Analytics Load OLAP structure from package Data sets stored on record level into HBase Query for facts based on different dimension and levels.

22

INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD

INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD National Archives of Estonia Kuldar As National Archives of Sweden Karin Bredenberg University of Portsmouth Janet Delve

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

IST722 Data Warehousing

IST722 Data Warehousing IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

Database preservation toolkit:

Database preservation toolkit: Nov. 12-14, 2014, Lisbon, Portugal Database preservation toolkit: a flexible tool to normalize and give access to databases DLM Forum: Making the Information Governance Landscape in Europe José Carlos

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP

More information

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing (Week 10) A04. Information System for CRM Electronic Commerce Marketing Course Code: 166186-01 Course Name: Electronic Commerce Marketing Period: Autumn 2015 Lecturer: Prof. Dr. Sync Sangwon Lee Department:

More information

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006 What is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-varying, non-volatile

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

DATA WAREHOUSE E KNOWLEDGE DISCOVERY DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA

More information

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1 Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

OLAP and Data Warehousing! Introduction!

OLAP and Data Warehousing! Introduction! The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still

More information

OLAP (Online Analytical Processing) G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

OLAP (Online Analytical Processing) G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT OLAP (Online Analytical Processing) G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT OVERVIEW INTRODUCTION OLAP CUBE HISTORY OF OLAP OLAP OPERATIONS DATAWAREHOUSE DATAWAREHOUSE ARCHITECHTURE DIFFERENCE

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Data W a Ware r house house and and OLAP Week 5 1

Data W a Ware r house house and and OLAP Week 5 1 Data Warehouse and OLAP Week 5 1 Midterm I Friday, March 4 Scope Homework assignments 1 4 Open book Team Homework Assignment #7 Read pp. 121 139, 146 150 of the text book. Do Examples 3.8, 3.10 and Exercise

More information

Data Warehousing, OLAP, and Data Mining

Data Warehousing, OLAP, and Data Mining Data Warehousing, OLAP, and Marek Rychly mrychly@strathmore.edu Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems

More information

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

CS2032 Data warehousing and Data Mining Unit II Page 1

CS2032 Data warehousing and Data Mining Unit II Page 1 UNIT II BUSINESS ANALYSIS Reporting Query tools and Applications The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse? Outline Data Warehousing What is a data warehouse? Why a warehouse? Models & operations Implementing a warehouse 2 What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES MUHAMMAD KHALEEL (0912125) SZABIST KARACHI CAMPUS Abstract. Data warehouse and online analytical processing (OLAP) both are core component for decision

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

DATA WAREHOUSE AND OLAP TECHNOLOGIES. Outline. Data Warehouse Data Warehouse OLAP. A data warehouse is a:

DATA WAREHOUSE AND OLAP TECHNOLOGIES. Outline. Data Warehouse Data Warehouse OLAP. A data warehouse is a: DATA WAREHOUSE AND OLAP TECHNOLOGIES Keep order, and the order shall save thee. Latin maxim Outline 2 Data Warehouse Definition Architecture OLAP Multidimensional data model OLAP cube computing Data Warehouse

More information

Terminology and Definitions. Data Warehousing and OLAP. Data Warehouse characteristics. Data Warehouse Types. Typical DW Implementation

Terminology and Definitions. Data Warehousing and OLAP. Data Warehouse characteristics. Data Warehouse Types. Typical DW Implementation Data Warehousing and OLAP Topics Introduction Data modelling in data warehouses Building data warehouses View Maintenance OLAP and data mining Reading Lecture Notes Elmasriand Navathe, Chapter 26 Ozsu

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course Module 1: Introduction to Data Warehousing and OLAP Introducing Data Warehousing Defining OLAP Solutions Understanding Data Warehouse Design Understanding OLAP Models Applying OLAP Cubes At the end of

More information

Java Metadata Interface and Data Warehousing

Java Metadata Interface and Data Warehousing Java Metadata Interface and Data Warehousing A JMI white paper by John D. Poole November 2002 Abstract. This paper describes a model-driven approach to data warehouse administration by presenting a detailed

More information

CINECA Innovative Open Source Technologies for a CRIS: SURplus ~ www.cineca.it

CINECA Innovative Open Source Technologies for a CRIS: SURplus ~ www.cineca.it CINECA Innovative Open Source Technologies for a CRIS: SURplus ~ www.cineca.it Topics CINECA: a brief overview Solutions for Higher Education & Research Institutions Three innovative open-source technologies

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Week 13: Data Warehousing. Warehousing

Week 13: Data Warehousing. Warehousing 1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,

More information

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Decision Support Chapter 23 Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Technology-Driven Demand and e- Customer Relationship Management e-crm

Technology-Driven Demand and e- Customer Relationship Management e-crm E-Banking and Payment System Technology-Driven Demand and e- Customer Relationship Management e-crm Sittikorn Direksoonthorn Assumption University 1/2004 E-Banking and Payment System Quick Win Agenda Data

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer Data Warehousing Overview, Terminology, and Research Issues 1 Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

Establish and maintain Center of Excellence (CoE) around Data Architecture

Establish and maintain Center of Excellence (CoE) around Data Architecture Senior BI Data Architect - Bensenville, IL The Company s Information Management Team is comprised of highly technical resources with diverse backgrounds in data warehouse development & support, business

More information

14. Data Warehousing & Data Mining

14. Data Warehousing & Data Mining 14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,

More information

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

A Scalable Data Transformation Framework using the Hadoop Ecosystem

A Scalable Data Transformation Framework using the Hadoop Ecosystem A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional

More information

TRANSFORMING YOUR BUSINESS

TRANSFORMING YOUR BUSINESS September, 21 2012 TRANSFORMING YOUR BUSINESS PROCESS INTO DATA MODEL Prasad Duvvuri AST Corporation Agenda First Step Analysis Data Modeling End Solution Wrap Up FIRST STEP It Starts With.. What is the

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not

More information

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES ORACLE TAX ANALYTICS KEY FEATURES A set of comprehensive and compatible BI Applications. Advanced insight into tax performance Built on World Class Oracle s Database and BI Technology Design after the

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

B.Sc (Computer Science) Database Management Systems UNIT-V

B.Sc (Computer Science) Database Management Systems UNIT-V 1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

More information

Introduction to Databases, Fall 2004 IT University of Copenhagen. Lecture 6, part 2: OLAP and data cubes. October 8, Lecturer: Rasmus Pagh

Introduction to Databases, Fall 2004 IT University of Copenhagen. Lecture 6, part 2: OLAP and data cubes. October 8, Lecturer: Rasmus Pagh Introduction to Databases, Fall 2004 IT University of Copenhagen Lecture 6, part 2: OLAP and data cubes October 8, 2004 Lecturer: Rasmus Pagh Today s lecture, part II Information integration. On-Line Analytical

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

Data warehouse and Business Intelligence Collateral

Data warehouse and Business Intelligence Collateral Data warehouse and Business Intelligence Collateral Page 1 of 12 DATA WAREHOUSE AND BUSINESS INTELLIGENCE COLLATERAL Brains for the corporate brawn: In the current scenario of the business world, the competition

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

A Technical Review on On-Line Analytical Processing (OLAP)

A Technical Review on On-Line Analytical Processing (OLAP) A Technical Review on On-Line Analytical Processing (OLAP) K. Jayapriya 1., E. Girija 2,III-M.C.A., R.Uma. 3,M.C.A.,M.Phil., Department of computer applications, Assit.Prof,Dept of M.C.A, Dhanalakshmi

More information

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Analyzing Polls and News Headlines Using Business Intelligence Techniques Analyzing Polls and News Headlines Using Business Intelligence Techniques Eleni Fanara, Gerasimos Marketos, Nikos Pelekis and Yannis Theodoridis Department of Informatics, University of Piraeus, 80 Karaoli-Dimitriou

More information

SAS BI Course Content; Introduction to DWH / BI Concepts

SAS BI Course Content; Introduction to DWH / BI Concepts SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console

More information

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS Luca Santillo (luca.santillo@gmail.com) Abstract Data Warehouse Systems are a special context for the application of functional software metrics. The use of

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002 IAF Business Intelligence Solutions Make the Most of Your Business Intelligence White Paper INTRODUCTION In recent years, the amount of data in companies has increased dramatically as enterprise resource

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

The Benefits of Data Modeling in Data Warehousing

The Benefits of Data Modeling in Data Warehousing WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2

More information

FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7

FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7 FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7 INTRODUCTION WhamTech offers unconventional data access, analytics, integration, sharing and interoperability

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

Data Warehouse Architecture

Data Warehouse Architecture Anwendungssoftwares a -Warehouse-, -Mining- und OLAP-Technologien Warehouse Architecture Overview Warehouse Architecture Sources and Quality Mart Federated Information Systems Operational Store Metadata

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong gaocong@cs.aau.dk Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information