Data-Warehouse & Big Data Testing at The End of the Food Chain Thomas Abinger, Georg Fischer, September 18th, 2014 Copyright 2014, Tricentis GmbH. All Rights Reserved. 1
Agenda DWH vs. Big Data Automated Testing in DWH-Projects Big Data Testing Summary Differentiation between Data Warehouses and Big Data Set-Up Test and use Tosca iq Big Data and Automated Test Wrap up Copyright 2014, Tricentis GmbH. All Rights Reserved. 2
Data Warehouses vs. Big Data Data Warehouse Large Data Volume Big Data Operational Database with a Huge Volume of Data Data Updates and Data Archiving in Defined Intervals from Online DB Online Data Line Oriented SQL Structured Data Central Architecture Column Oriented / File Based NoSQL non-relational DBs or JSON Structured and Unstructured Data Distributed (HDFS) Architecture Copyright 2014, Tricentis GmbH. All Rights Reserved. 3
Primary Systems DWH Testing - Overview ETL Stages BI Stages Extract Transform Load Consolidation Aggregation Reporting Reports Big Data Stage 0 Stage Core DWH Stage Stage n Copyright 2014, Tricentis GmbH. All Rights Reserved. 4
Customer Survey I don t agree I agree Poor quality of data delivered to the DWH Limited regression testing of data processing along the DWH/BI Business departments are highly involved in manual testing of reports Copyright 2014, Tricentis GmbH. All Rights Reserved. 5
Primary Systems DWH Testing complex SQL Queries Extract ETL Stages Transform Load Consolidation Aggregation BI Stages Reporting Big Data Check SQL Query Stage 0 SQL - Queries Stage Hyper complex Stage Slow! limited number of verifications possible Who can understand and maintain this? Reports Stage n Copyright 2014, Tricentis GmbH. All Rights Reserved. 7
Data-Profiling Test Attribute Concerns the Logic in Stage n Product Category Candy Frozen Food Beer... Store Stadthalle Airport Central Station Business Rules Concerns the Attributes No Frozen Food at the Airport Profile 1 Beer Stadthalle Profile 2 FF Airport Profile Revenue 100.000 EUR 0 EUR Tolerance of Deviation +/-10.000 EUR 0 EUR Copyright 2014, Tricentis GmbH. All Rights Reserved. 8
Landscaping Großglockner Alps Großglockner view from south-west: 1=Glocknerwand, 2=Untere Glocknerscharte, 3=Teufelshorn (left) / Glocknerhorn (right), 4=Teischnitzkees, 5=Großglockner, 6=Kleinglockner, 7= Stüdlgrat, 8=Ködnitzkees, 9=Adlersruhe Copyright 2014, Tricentis GmbH. All Rights Reserved. 9
Landscaping Großglockner Alps Großglockner view from south-west: 1=Glocknerwand, 2=Untere Glocknerscharte, 3=Teufelshorn (left) / Glocknerhorn (right), 5=Großglockner, 6=Kleinglockner Copyright 2014, Tricentis GmbH. All Rights Reserved. 10
Landscaping: Example Billa Ref. Revenue Product Groups / Store May 2014 600 k [EUR] 500 k [EUR] 400 k [EUR] 300 k [EUR] 200 k [EUR] 100 k [EUR] 0 k [EUR] Product Group Cosmetic Product Group Beer Product Group Baked Goods Product Group Fruit 0 k [EUR]-100 k [EUR] 100 k [EUR]-200 k [EUR] 200 k [EUR]-300 k [EUR] 300 k [EUR]-400 k [EUR] 400 k [EUR]-500 k [EUR] 500 k [EUR]-600 k [EUR] Copyright 2014, Tricentis GmbH. All Rights Reserved. 11
Landscaping: Example Billa Revenue Product Groups / Store May 2015 600 k [EUR] 500 k [EUR] 400 k [EUR] 300 k [EUR] 200 k [EUR] 100 k [EUR] 0 k [EUR] Product Group Cosmetic Product Group Beer Product Group Baked Goods Product Group Fruit 0 k [EUR]-100 k [EUR] 100 k [EUR]-200 k [EUR] 200 k [EUR]-300 k [EUR] 300 k [EUR]-400 k [EUR] 400 k [EUR]-500 k [EUR] 500 k [EUR]-600 k [EUR] Copyright 2014, Tricentis GmbH. All Rights Reserved. 12
Process Quality: Testing of Business Rules DWH Challenges exactly the same in Testing Business Rules are grown who knows them all? No Contact Person available Data consistency can be tested through Stages Copyright 2014, Tricentis GmbH. All Rights Reserved. 13
Checks for DWH Testing Vital Check Basic-Checks like Number of Data sets and other Parameters Key and Join Tests Tool-Support: Tosca DB Engine, predefined building blocks in Tosca TestCase Design Delivery Check Column and Dependency-Checks - Business Logic Tool-Support: Tosca DB Engine, predefined building blocks in Tosca TestCase Design Checks Tosca iq Speed Optimized Memory Optimization for Queries Variant Records are shown Copyright 2014, Tricentis GmbH. All Rights Reserved. 14
TOSCA iq Operating principle Profiles Physical Queries TC1 TC2 Causes Error TC3 TC4 TOSCA IQ TC5 TC6 TC7 TC8 TC9 Causes Error TC10 TC9 Record set: ID xyz123456 Copyright 2014, Tricentis GmbH. All Rights Reserved. 15
Big Data Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... [Dan Ariely; Facebook Posting; January 6 th, 2013] Copyright 2014, Tricentis GmbH. All Rights Reserved. 16
Big Data Market Potential McKinsey: Multi-Million USD in the following areas: Big data: The next frontier for innovation, competition and productivity, McKinsey Global Institute, October 2011 Copyright 2014, Tricentis GmbH. All Rights Reserved. 17
Big Data Example Use Cases Sport Tracker (GPS, Pulse Rate, Blood Pressure) Medical Data Logger (Elderly Care at Home) Connected Cars Copyright 2014, Tricentis GmbH. All Rights Reserved. 18
Big Data feeding Data Warehouses Big Data Unstructured Data Potential starting point Big Data Analysis Structured Data Potential starting point or intermediate stage Big Data Analysis Other Use Cases DWH Copyright 2014, Tricentis GmbH. All Rights Reserved. 19
Global data generated per year (Exabyte) 45000 40000 40026 35000 30000 25000 20000 15000 10000 8591 5000 2837 1227 130 0 2005 2010 2012 2015 2020 Source: Statista 06/2014 Copyright 2014, Tricentis GmbH. All Rights Reserved. 20
Structured and unstructured data 90% of the global data is unstructured Pictures Music Videos Social Media Content Used by: Copyright 2014, Tricentis GmbH. All Rights Reserved. 21
Leading questions for testing Which type of data are processed in the context of Big Data? Unstructured data are used indirect and not direct. The analysis starts with structured data, generated in an interpretation step. Copyright 2014, Tricentis GmbH. All Rights Reserved. 22
Focus Big Data Testing Data Warehouse Large Data Volume Big Data Operational Database with a Huge Volumn of Data Data Updates and Data Archiving in Defined Intervals from Online DB Online Data Line Oriented SQL + Tosca iq Structured Data Central Architecture Column Oriented / File Based NoSQL non-relational DBs or JSON Structured and Unstructured Data Distributed (HDFS) Architecture Copyright 2014, Tricentis GmbH. All Rights Reserved. 23
Summary Data Warehouse The End of the Food Chain : Data Quality as a additional Risc Factor Data and Process Quality can be tested using Profiling and Data Landscaping Big Data New Technologies promising Future Classical functional Testing to analyze the Data Source for Data Warehouse: Classic Methods for Monitoring the Data Quality Copyright 2014, Tricentis GmbH. All Rights Reserved. 24
Thank You! Now it s your turn Questions & Answers Copyright 2014, Tricentis GmbH. All Rights Reserved. 25