Data Vault at work Does Data Vault fulfill its promise?
Leading player on Dutch energy market Approximately 1,000 employees Production capacity: 3,813 MW 20% of the total Dutch electricity production capacity 07-06-2013 2
GDF SUEZ Portfolio Management 07-06-2013 3
GDF SUEZ Portfolio Management 07-06-2013 4
Direct Cause Before: BO Universe directly on source systems Error in the conversion of gas to electricity detected in one of the reports Further investigation revealed that this error occurred in every report Fixing this error took several months -> unacceptable to the business 07-06-2013 5
Requirements Validation of data quality Single point of definition for Business Rules Insight in calculations and origin of information Less dependency on IT Frequently changing source systems Data Vault Short load times, quick responses Robust, documented and well-managed 07-06-2013 6
DWH4GSPM Architecture Metadata Source Systems Staging Area Data Layer Information Layer User Access Layer 07-06-2013 7
DWH4GSPM Tools & Techniques Data Vault for Data Layer Star Schemas for Data Marts Project approach: SCRUM Reporting tool: Business Objects BOXI ETL Tool: Business Objects Data Services (BODS) Metadata: Business Objects Metadata Manager Database: Oracle 10g 07-06-2013 8
The promise of Data Vault The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework" Bill Inmon Source data is stored unaltered (1:1 with source system) No integration No cleansing Changes in source systems do not lead to changes in existing Data Vault tables Only new tables or new rows When should Data Vault be used? When accountability and traceability are important (legal obligations, audit by accountant) When source systems change often When Business Logic (formulas, enrichment, integration, etc.) changes often 07-06-2013 9
Data Vault: From ETL to ELT Extract Transform Load Staging Area EDW 3NF Data Mart Data Mart Extract Staging Area Load Data Vault Transform Data Mart Data Mart 07-06-2013 10
Does Data Vault live up to its promises? Source data is stored unaltered Content Structure Changes in source systems do not lead to changes in existing Data Vault tables True, but it does lead to a more complex transformation from Data Vault to Data Mart No change in total design and development effort, from Source System to Reporting tool. When should Data Vault be used? When accountability and traceability are important (legal obligations, audit by accountant) True, if change in structure is acceptable When source systems change often Data Vault only brings small advantages When Business Logic (formulas, enrichment, integration, etc.) changes often True, no data has been lost due to former Business Logic Not true, more complex transformations from Data Vault tor Data Mart 07-06-2013 11
Issues during project Refactoring Incremental approach for Star Schemas does not work well Leads to complex transformations from Data Vault to Data Mart And frequently changing BO Universe Load times Solution: Data Vault encourages parallel loads Low understanding of SCRUM by business and project team members No out-of-the-box support for automated testing Test scripts could not keep up with frequently changing ETL Data Vault is not yet supported by automated test tooling 07-06-2013 12
Business Evaluation Response times are good Load takes too long Design of Star Schemas optimized for reporting Do not support analysis well Change of source takes too much time 07-06-2013 13
Is the traditional DWH architecture getting obsolete? Metadata Near-Realtime Zone Source Systems Staging Area Data Layer Intermediate Layer Information Layer User Access Layer Agile Development Zone Data gathering Information delivery 07-06-2013 14
What will be the impact of recent developments? 07-06-2013 15