Monitoring Genebanks using Datamarts based in an Open Source Tool



Similar documents
Data W a Ware r house house and and OLAP II Week 6 1

CS2032 Data warehousing and Data Mining Unit II Page 1

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

DATA WAREHOUSING - OLAP

Business Intelligence, Data warehousing Concept and artifacts

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

University of Gaziantep, Department of Business Administration

CHAPTER 4: BUSINESS ANALYTICS

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

SAS BI Course Content; Introduction to DWH / BI Concepts

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Week 3 lecture slides

Open Source Business Intelligence Intro

DATA WAREHOUSING AND OLAP TECHNOLOGY

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Data Warehouse design

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

Introduction to Data Warehousing. Ms Swapnil Shrivastava

CHAPTER 4 Data Warehouse Architecture

SQL Server 2012 End-to-End Business Intelligence Workshop

Data Warehouse: Introduction

MS 50511A The Microsoft Business Intelligence 2010 Stack

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SQL Server 2012 Business Intelligence Boot Camp

Implementing Data Models and Reports with Microsoft SQL Server

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

Business Intelligence, Analytics & Reporting: Glossary of Terms

When to consider OLAP?

Implementing Data Models and Reports with Microsoft SQL Server

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data Warehousing. Paper

Part 22. Data Warehousing

Building Data Cubes and Mining Them. Jelena Jovanovic

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

SQL SERVER TRAINING CURRICULUM

Business Intelligence & Product Analytics

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

A Technical Review on On-Line Analytical Processing (OLAP)

CHAPTER 5: BUSINESS ANALYTICS

OLAP Systems and Multidimensional Expressions I

Reporting trends and pain points of current and new customers IBM Corporation

Module 1: Introduction to Data Warehousing and OLAP

Data Warehousing and OLAP

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Testing on Business Intelligence & Data Warehouse Projects

OLAP Theory-English version

Business Intelligence for Dynamics GP. Presented By: Rob Jackson, Business Intelligence Consultant Brent Keilin, GP Consultant

Data Warehousing and OLAP Technology for Knowledge Discovery

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Designing a Dimensional Model

Optimizing Your Data Warehouse Design for Superior Performance

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

Week 13: Data Warehousing. Warehousing

BUILDING OLAP TOOLS OVER LARGE DATABASES

Data Warehousing OLAP

SQL Server Administrator Introduction - 3 Days Objectives

SQL Server Analysis Services Complete Practical & Real-time Training

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Hybrid OLAP, An Introduction

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc.

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Driving Peak Performance IBM Corporation

The difference between. BI and CPM. A white paper prepared by Prophix Software

Online Courses. Version 9 Comprehensive Series. What's New Series

DATA WAREHOUSE BUSINESS INTELLIGENCE FOR MICROSOFT DYNAMICS NAV

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

LEARNING SOLUTIONS website milner.com/learning phone

SAS Business Intelligence Online Training

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Turkish Journal of Engineering, Science and Technology

OLAP Systems and Multidimensional Queries II

IBM Cognos Training: Course Brochure. Simpson Associates: SERVICE associates.co.uk

Implementing a Data Warehouse with Microsoft SQL Server

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

An Architectural Review Of Integrating MicroStrategy With SAP BW

Business Intelligence

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

SAP BO Course Details

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Distance Learning and Examining Systems

East Asia Network Sdn Bhd

Data Warehousing Systems: Foundations and Architectures

Multi-dimensional index structures Part I: motivation

Building Cubes and Analyzing Data using Oracle OLAP 11g

Transcription:

Monitoring Genebanks using Datamarts based in an Open Source Tool April 10 th, 2008 Edwin Rojas Research Informatics Unit (RIU) International Potato Center (CIP) GPG2 Workshop 2008

Datamarts Motivation Huge amounts of data need to be summarized in various forms to enable data creators and data users to get quick overviews and dig into details as needed. However, creating quantities of reports by hand, even with commodity tools like Excel, can become impossible. The business community has similar needs and developed both commercial (Microsoft data marts) and free open source solutions (Mondrian-Pentaho).

How Datamart Works CIP Relational Database MS-SQL Job Script (PHP) CIP Multidimensional Database MySQL HTML Pivot table and chart Relational Model Multidimensional Model DBDesigner Eclipse Plug-in Mondrian Java/Swing application http://sourceforge.net/projects/rubik/

Main Features of Mondrian Datamart Cross-tables tables with corresponding charts Drill down to see details of summary data (drill down/exploration) Flexibility to configure cross-tables (add/removemove fields; switch rows/columns) Export to Excel and PDF Web-based and stand-alone app Is integrated with an Open Source BI (Pentaho)

DataMart Architectural Data Warehouse Client (web, standalone app) Data Warehouse Engine (OLAP) -MDX Data Warehouse Repository (multidimensional data base) Data Source (relational db, flat files) Script -Populate database for dimensional model -Regenerate aggregated tables

Mondrian engine versions 1.0 version 2004 2.0 version August 2005 2.1 version March 2006 2.4 version March 2008 Precomputed totals created when the first user run Stored in temporary cache Precomputed totals created when the model db is created Stored in tables db Mondrian as a component of business intelligent framework - BI Mondrian: http://mondrian.pentaho.org/

Mondrian a Component of Pentaho Open Source Components for Research Intelligence(RI) Platform Pentaho: http://www.pentaho.org/ Reporting Eclipse Project & Jasper Analysis Data Mining Workflow

Data Mart Types Part I In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP. MOLAP, This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Advantages: Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations. Can perform complex calculations: All calculations have been pregenerated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summarylevel information will be included in the cube itself. Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.

Data Mart Types Part II ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Advantages Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database.in other words, ROLAP itself places no limitation on data amount. Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. Disadvantages Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions. HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data.

Multidimensional Model Elements Part I Dimension A category of information. For example, the taxonomy dimension. Hierarchy Levels The specification of levels that represents relationship between different attributes within a hierarchy. For example, one possible hierarchy in the Taxonomy dimension is Family --> Genus --> Series --> Species. A fact table is a table that contains the measures of interest. For example, accessions count would be such a measure. This measure is stored in the fact table with the appropriate granularity. A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.

Multidimensional Model Elements Part II In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema. Star Schema In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table. Snowflake Schema The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables. Whether one uses a star or a snowflake largely depends on personal preference and business needs.

Case Study for ICIS Inventory (IMS) Database

Case Study for CIP Passport Database

Case Study for CIP Field & Molecular Experiments Database

Case Study for ICIS Genealogy (GMS) Database 2 millions of germplasm

Demos Demo based in Microsoft Tools Demo based in Mondrian Tool