Data Vault & Pentaho in Healthcare. Kasper de Graaf, Aly Hollander



Similar documents
Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Instant Data Warehousing with SAP data

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Open Source Business Intelligence

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Warehouse Modeling Industry Models

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland

Implementing a Data Warehouse with Microsoft SQL Server

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777


Business Intelligence for Big Data

Implementing a Data Warehouse with Microsoft SQL Server

HALOGEN. Technical Design Specification. Version 2.0

Data Warehouse / MIS Testing: Corporate Information Factory

Quick start. A project with SpagoBI 3.x

Palo Open Source BI Suite

Big Data-Challenges and Opportunities

Performance and Scalability Overview

POLAR IT SERVICES. Business Intelligence Project Methodology

QA Tools (QTP, QC/ALM), ETL Testing, Selenium, Mobile, Unix, SQL, SOAP UI

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Course Outline. Module 1: Introduction to Data Warehousing

How, What, and Where of Data Warehouses for MySQL

Breadboard BI. Unlocking ERP Data Using Open Source Tools By Christopher Lavigne

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Your Private Hosted QlikView Server Solution.

Tenable for CyberArk

Implementing a Data Warehouse with Microsoft SQL Server 2014

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

The Enterprise Data Hub and The Modern Information Architecture

WHITEPAPER QUIPU version 1.1

Microsoft BI Platform Overview

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server

Nothing in this job description restricts management's right to assign or reassign duties and responsibilities to this job at any time.

SAP Data Services 4.X. An Enterprise Information management Solution

IBM WebSphere DataStage Online training from Yes-M Systems

Real-time Data Replication

Presentation at 2006 DAMA / Wilshire Metadata Conference. John R. Friedrich, II, PhD Friedrich@metaintegration.net

MASTER DATA MANAGEMENT TEST ENABLER

East Asia Network Sdn Bhd

Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance

Tips and Tricks for Using ncode

A McKnight Associates, Inc. White Paper: Effective Data Warehouse Organizational Roles and Responsibilities

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Talend Big Data. Delivering instant value from all your data. Talend

Replicating to everything

The ABCs of DaaS. Enabling Data as a Service for Application Delivery, Business Intelligence, and Compliance Reporting.

For Sales Kathy Hall

The deployment of OHMS TM. in private cloud

Analance Data Integration Technical Whitepaper

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Your Path to. Big Data A Visual Guide

Pentaho Enterprise and Community Editions Feature Comparison

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Validation, Transformation & Loading Solution (VTL)

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

Vendor: Brio Software Product: Brio Performance Suite

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Data processing goes big

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Overview. Edvantage Security

isolar Integrated Solution for AUTOSAR

2. Metadata Modeling Best Practices with Cognos Framework Manager

QAD Business Intelligence Release Notes

Performance and Scalability Overview

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Sisense. Product Highlights.

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Relational Databases for the Business Analyst

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Open Source Business Intelligence Intro

Implementing a Data Warehouse with Microsoft SQL Server 2012

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Resources You can find more resources for Sync & Save at our support site:

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

HOSPITAL MANAGEMENT SYSTEM

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

SAAS BASED INVENTORY MANAGEMENT SYSTEM WHITE PAPER

Release Document Version: User Guide: SAP BusinessObjects Analysis, edition for Microsoft Office

Business Intelligence Solution for Small and Midsize Enterprises (BI4SME)

Transcription:

Data Vault & Pentaho in Healthcare Kasper de Graaf, Aly Hollander

St. Antonius Ziekenhuis Nieuwegein / Utrecht 3 locations 5.000 employees 1.100 beds 33 specialties 250 specialists 150 junior doctors 2

Santeon Santeon Hospitals 1. 2. 3. 4. 5. 6. 3 Canisius-Wilhelmina Ziekenhuis Catharina Ziekenhuis Martini Ziekenhuis Medisch Spectrum Twente Onze Lieve Vrouwe Gasthuis (OLVG) St. Antonius Ziekenhuis

Healthcare is all about data patient files, diagnostics, R&D DOT, Care activities, Appointments, Procurement Like an ordinary business, Not more difficult, But complex 4

EPR (Electronic Patient Record) system for Sint Antonius Ziekenhuis Maintains and improves the quality of care Developed and maintained by ICT department Based on web technology, open standards and open source software Supports primary health care process Modulair system, can be tuned for various user profiles: doctors, nurses or other health professionals More info: www.intrazis.org 5

IntraZis, Data Warehouse & Pentaho 2010: more demand for (management-)information IntraZis is not suitable for extensive queries ICT department starts a DWH and BI project MySQL en Pentaho was chosen in the tradition of in-house development and open source 6

Data Vault & Pentaho 7

Data Vault ETL-issues Many objects to load (hubs, links, satellites) Automation (almost) required We did NOT want to Get rid of ETL tooling Code the ETL ourselves Manage too many ETL objects (so no generation of ETL mappings or transformations) 8

Our Solution Use meta data to drive generic ETL transformations 9

Result: The Kettle Data Vault Framework A set of generic ETL transformations Driven by meta data (currently in XLS; loaded to MySQL Database) A couple of configuration files A couple of ETL jobs and transformations to tie it all together 10

The Architecture Files MySQL DBMS ETL CSV Files ERP Sources 11 ETL: Kettle Data Vault Frame work Staging Area ETL Process MySQL Data Vault ETL Central DWH & Data Marts Data Warehouse EUL

Data Vault Size Approx. 125 tables (excl error and helper tables) 40GB of data Largest table: 42 mln rows Total rows: 160 mln Refresh rate: twice a day Growth: approx. 100.000 rows daily New tables: varies strongly (new functionality is added on project basis 12

Advantages of Data Vault for us Full traceability of history (a DBC changes rapidly over time, we often see more than 30 versions) Data Model is very extensible (incorporating new source systems) Business rules are moved downstream (and change often) Generic solution saves us a lot of testing 13

The Tooling Database: MySQL ETL: Pentaho Data Integration (Kettle) BI: Pentaho 14

Automation? Staging physical database & loading: can be automated, but currently not part of the framework Data vault design: manual Data vault physical database: manual Mapping from source to data vault (Excel sheet): manual Data vault population: automated using the framework Data marts & BI: manual 15

So what does this Framework do? Automatically populate the entire data vault data warehouse Generate logging Error rows are inserted in special error tables Restartable (using the load_dts of the previous run) 16

Some design decisions Updateable views with generic column names Compare satellite attributes using string comparison (concatenate all columns, with (pipe) as delimiter) 'inject' the metadata using Kettle parameters Generate and use an error table for each Data Vault table Check for design errors (i.e. references to non-existent tables, connections, attributes) Parallel processing 17

Supported constructs Hubs, Links, Satellites Multi source hubs and links Last_seen_dts (hubs and links) Link attributes (attribute in a link that references a hub that is not modeled, like orderline) Link validity satellites (special satellites that o.a. keep track of deleted link rows) 18

Not (yet) supported constructs Composite business keys in a hub (can be solved using concatenation) Link-to-link relationships Multi active satellites CDC-like staging areas 19

Meta data tables 20

Example: meta data and excel 21

Example: a complete run 22

Example: transformation for a hub 23

Final remarks PDI framework & data vault now operational for > 2 years, still growing and still going strong Generic solution saves an enormous amount of time (both development and testing) Generic solution is a bit harder to maintain and debug Luckily maintenance is now close to zero Mistakes in the design sheet are easily made; we re considering a specialized tool (Talend Master Data) 24

Want to try? The PDI framework is open source! Download a fully operational Virtual Machine at: http://sourceforge.net/projects/pdidatavaultfw/ Developer: Edwin Weber (eacweber@gmail.com) 25

26