Reflections on Agile DW by a Business Analytics Practitioner Werner Engelen Principal Business Analytics Architect
Introduction Werner Engelen Active in BI & DW since 1998 + 6 years at element61 Previously: Oracle, PwC Consulting & IBM Global Business Services Proven track record in dimensional modeling, data quality, setup BICCs, project methodologies, project management, quality assurance, business analysis & ETL design But you can also talk to me about photography, urban exploration & landscape design
Just jump & swim?
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Agile BI? Waterfall BI Requirements Design Code Test Rather than doing all of one thing at a time... agile BI teams do a little of everything all the time Agile BI
Agile in a nutshell Sprint retrospective
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
All we want is... a dimensional model SALES DATE CUSTOMER REVENUE STORE PRODUCT PRO - MOTION
How do we ask questions? WHAT? WHEN? WHAT? WHO? HOW MANY? How do this month s sales by sales rep of nonfood products which we promoted to consumers in Japan compare with previous years? WHERE? WHY? WHEN? WHO?
BI model canvas Need for a common questions framework WHEN When does it happen? date, time period, timeline... WHERE Where does it happen? Where does it refer to? location, store, facility... HOW How does it happen? How do we know it happened? How do we uniquely define an event? transaction type, transaction identifier... HOW MANY How many/much is involved? How long does it take? revenues, costs, quanities, durations... WHY Why does it happen? cause, reason, promotion... WHO Who does what? Who else is involved? Who is organizated how? customer, employee, supplier, sales rep... WHAT What is involved? What is the value proposition? product, service, resource...
Link business questions to design Product backlog Data model & Source to Target design
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Governance? The scope of a BICC is # 100% of all BI related applications But: still a minimal insight & governance is required Each BI application can be defined within a certain category Define degree of governance by BICC for each category Mandatory deliverables? (at a certain point in time a departmental BI application might be promoted to a corporate BI application) How to a approach a BI project (requirements...) Framework, standards, guidelines Naming conventions Tools set... 0% 100% Special-purpose BI applications Departmental BI applications Cross-functional / cross-departmental BI applications Corporate BI applications
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Possible architectural issues? Modify fact to lower granularity Modify leading sources Modify definitions Add dimensions Add fields Add history Modify functionality (transactional accumulated snapshot)... inflexible architecture & data model Costs & time go up
Data model / architecture anticipation 3-tier architecture Get the data (extract) source, landing zone, staging area... Store the data (register) data warehouse EDW, Data Vault, ODS, Kimball 1st level, Kimball granular, 3NF... Present the data data mart Kimball (combination 1st & 2nd level), cubes... IN KEEP ALL RELEVANT OUT SOURCE ORIENTED TARGET ORIENTED
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
ETL & modeling de-composition Breadth or depth? Split-up ETL & modeling in smaller pieces Minimize ETL and modeling in early iterations De-composition helps in planning activities De-composition supports early feedback Breadth Simplified load of the most important dimension models Early feedback, earlier build of dependent systems Depth Complete load of one dimension at a time Early deployment of complete usable sub systems
Start with a small thingy? FROM TO Divide dimension tables no history (current view only) include history Divide rows Group records by type Divide rows Subset of data (e.g. Customer: consumer, business) all types 10 % of data n % 100% Divide by columns columns from source 1 columns from source 2 all columns Data quality include only non-outliers include outliers ETL complexity simpler / earlier tasks complex tasks ETL refresh frequency one time load incremental load (monthly daily) ETL transformations (raw) data directly aggregations and/or business rules applied ETL target layer ETL degree of automation Subject area completeness source directly Manual most important star, dimension, attributes in a dimension staging presentation BI tools (semantic layer) fully automated all data model elements
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Agile BI = data intensive Traditional BI Proven answers to known questions High-value reporting specifies drives need for. new. adjusted BI content for. better Data discovery Functional data connection Early access to data Fast answers to new questions Short-term reporting Source of requirements Helps in prioritizing Data profiling Data quality insight Identify & confront with issues asap Source of requirements
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Keep your consumers close by Keep your data providers even closer
When is BI impacted? releases abstraction layers screens retention... direct & indirect DML Table 1 Column A - PK Column B relation - ship Table 4 Column I - PK Column J relationships current & historical Table 3 Column E - PK Column F - UK Table 2 Column C - PK Column D - FK interface Table 4 Column G - PK Column H - FK data source database 1 source database2... processes quality
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Automation Get your act together before things start of Don t try certain things for the first time Have a near perfect way & means of working Keep it simple Automate the simple / repetitive things Metadata driven generation Focus on time consuming: e.g. source analysis, ETL & testing (unit, regression ) Re-use Develop best practices & reuse (think big, start small) Focus on the more difficult processes E.g. gathering good requirements, complex dimensional models, business rules Welcome change, but Is your architecture fit enough? (which layers) Are your tools fit enough?
Offer some (re-usable) food for thought Data discovery Governance Sources ETL Architecture Model Automate
Divide & conquer: De-composition is the key
Thank you Werner Engelen Principal Business Analytics Architect