DNV presentation at Norsk Informatica Brukerforum Experiences and solution strategies from DNVs use of Informatica Jan Petter Holmberg and Kristian Ramsrud
DNV s main services 2
Highly skilled people across the world 300 offices 100 countries 9,000 employees, of which 82% have university degree 3
BICC organisation and roles HR Business Owner Business spec Finance Analyst Report Author Business Owner Analyst Report Author Business pec Analyst Business process Business spec Business Owner Business Advisor Business spec Analyst Report Author Package developer BICC (core) BI Architect DWH Architect DWH developer Report consumers Divisions Analyst BICC (virtual) Analyst Report consumers External services 4
DNV Data centred services common platform tandardised data capture and storage for all solutions tandardised processing and presentation for all solutions And internal DNV use tandardised portals and presentation for all solutions tandardised export formats to integrate with customer systems 1.Interview forms 2.Data entry forms DNV DB 1 Customer portal 1-n Customer Performance management system 3.ensor/ voyage recorder data import 4.Data from partners Vadis (Cognos) 5.Data from DNV processes 6.Purchased data DNV Datawarehouse 7.Web traffic 5
Number of vessels BI services KM support BI Competence & processe Quality management support Ext services support Cognos Production support Finance support Data Warehouse Data Q. support Capacity & competence Market Intelligence Efficiency Management support Exp/imp support DNV Contracts per year Number of vessels as of 2008.06.01 800 700 600 500 400 300 200 100 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 Contract year 6
External services 7
High level BI architecture Metadata Technical Metadata Business ource systems Internal data NP Data population DNV DWH Datamart Certificate Vadis TD. REPORT Data access AD-HOC REPORT PERF. MANAGEMENT Portals DNV Inside Agresso Affinitas Integration EAI Publish- ubscribe Datamart Finance BI Portal ( ) ETL Datamart Fleet ANALYTIC PLANNING & CONOLIDATION M OFFICE harepoint (NGWP) External data DNVX LRF Datamart HR X AI Y Integration of external data sources Z DNV DWH External/internal applications NP Affinitas Other internal External External Customers own portals ecurity Authentication Authorization Connection Operations Capacity Transport chedule Audit Error handling Backup/archive Performance 8
ystem landscape In production Jan 2009 branded as Vadis 200 Reports Already 1000 distinct users Components: Cognos 8 BI, Planning, Consolidation, Metric tudio ETL 1640 daily run sessions DW QL erver 1,6 TB 110 Fact tables 260 Dimensions Used for some source system specific BI services 9
Technical details Dev Test Prod environments on both PowerCenter and QL erver PowerCenter 8.6.1. Upgrade to PowerCenter 9 before summer QL erver 2005 Upgrade to QL erver 2008 R2 before summer From March 24 core CPU, 164 GB memory on QL ervers upplementary PowerCenter modules - RealTime - Informatica Data Quality 10
Use of Informatica in DNV ource systems Internal data NP Realtime integrations Agresso Affinitas ( ) DNV DWH Datamart Certificate ( ) External data LRF Batch integrations ETL Datamart Finance Datamart Fleet X AI Y Integration of external data sources Z Datamart HR DNV DWH 11
Accessing source systems Web service Export tables Replicated Database Views with data manipulation 1:1 views on base tables Base Tables CDC Folder or FTP Integration hub Flat file 12
Accessing source systems Level of data aggregation/manipulation Web service Higher uncertainty, less flexibility and possibly more maintenance Export tables Views with data manipulation and/or aggregation 1:1 views on base tables Base Tables CDC Need of communication between source system developers and Data Warehouse team 13
Data access - Preferences and Requirements Transactional data are required Reliable timestamps (if available) Consistent keys also after source system conversions Untouched data Complete data sets From a data warehouse point of view, we prefer access to the base tables All ways of accessing source system data have elements of risk. Navigating in the landscape of control, flexibility and stability is a political process. tandard methods for accessing data with corporate support. 14
Tailor made solutions in DNV ys1 ys2 ys3 DW As number of dependencies increases, the number of threads and decision points becomes difficult to deal with. 15
Ensuring data consistency Large number of sessions One, consolidated data warehouse Data quality and integrity? What happens if one source system is down or session fails? Informatica s integrated workflow tools: - Decision points - Threads with conditions - Demanding to maintain when the number of sessions and dependencies grow DNVs solution - All sessions write to a tailor made log table in the data warehouse - Table and session dependencies are registered - tored procedures: Fail session if dependent objects are not completed - Dependent on developer s input to the dependency system 16
Dealing with dependencies Log start Check dependencies (Will fail if ession 2 is depending on ession 1) Update log: et not completed sessions to Failed ession 1 x Main workflow ession 2 Log workflow start Check if last main workflow has completed Log start Check dependencies No end log due to session failure Log workflow end 17
Data delivery infrastructure Customer access area DW Reporting DW taging DW Reporting ource data copy Log table Object dependencies Intermediate calculations tar schemas tored procedures Log table Temp tables tar schemas ubscription tables tored procedures Log table Temp tables tar schemas tored procedures 18
Data delivery infrastructure Return of enriched data ource system Integrations DW load (Push) Integrations (Pull) DW Reporting DW taging ubscription tables Web service 19
Information needed around the clock 20
Global organisation Consitent data while data are loading Data have to be available Minimize time window for loading the data that are used by reports 21
Data delivery Large number of sessions ETL time window is a limited resource Utilize the ETL load over time Deliver data as soon as they are ready trategies for parallell load 22
Parallell load Final Load objects depending on more than one source system Publish Publish Load source system 1 Load source system 2 Publish Load source system 3 Publish Load source system 4 Load source system specific objects Common Load common dimensions 23
Disabling source systems when needed Final Load Disabled Load objects depending on more than one source system Publish Publish ource system 3 is main data source (use old 4 data) ource system 4 is main data source Load source system 1 Load source system 2 Common Publish Load source system 3 Disabled Load source system 4 A table is used for toggeling source systems on/off 24
Publish data to report marts ession 1 Check for normal load size Table1_tmp P(x) Table1 wap table names 25
Tailor made stored procedures tart ession - Check dependencies, fail session if dependent tables is not completed - Write to DW log table - ession start End ession - Write to DW log table - ession end - tatistical info Truncate table - Empty table in a secure way. (PowerCenter s truncate table option can not be used due to the use of stored procedures) wap Tables - wap table names - Check data integrity 26
ETL initiation cheduled load - Nightly - Multiple times per day Cognos calls web services for initiating ETL load on near real time basis 27
Wrap up DNV has to deal with a large number of sessions that feeds one, consolidated data warehouse. We have tuned PowerCenter to fit with these special requirements Questions or comments? 28
afeguarding life, property and the environment www.dnv.com 29