Test Validations for Next Generation Business Intelligence International Software Testing Conference 2012 Anusha Jaya Murthy Yerraguntla Infosys Limited (NASDAQ: INFY) 1
Abstract Traditional BI approaches like OLAP and data mining are proving to be insufficient to make rapid analytical decisions. This practice is turning out to be less efficient in the current economic scenario especially where real time data based decisions are to be made. This paper would give a briefly touch upon traditional BI and its limitations. It would focus upon the key facets of Next Generation BI, which has become an emerging trend in the field of Analytics. It would touch upon the additional validations to be done in the Next Gen BI from a testing stand point. 2
What is traditional Business Intelligence? This is a form of making business decisions using the data residing in the warehouse and data marts. It comprises of analysis spread across three layers Source, Analytics and Reporting. Key features: Analysis hinges upon Historical data Analysis performed only on the structured and organized data Data refresh in the user reports happen at a pre-defined frequency Source layer Analytics layer Reporting layer Fig 1 Layers of BI 3
What are the short comings of traditional Business Intelligence? Not efficient to deliver required information/data to Business on demand. This resulted in delayed and deferred decisions. Business reports would get updated only at the pre-defined frequency of the data loads. Could not help decision makers being pro-active to the changing requirements. They could only be reactive, since the opportunities were lost even before analysis, in most cases. The out come of the decisions made is not in line with the management expectations. Proved to be inefficient in managing unstructured data: in terms of analysis and its integration with structured data. 4
What are the solutions to these shortcomings? Real time BI A means to make business decisions using the real time data as against historical data Predictive Analytics - A means to predict the future business trends based on the real time and historical data. Textual Analytics - A means to analyze voluminous and un-organized data and make business decisions out of it. CDC on Demand -5A means to capture the most recent data as and when needed. This is
What is a Next Generation Business intelligence? A comprehensive package which combines multiple features listed below : Real time BI Predictive Analytics Textual Analytics CDC on Demand 6
Next Generation Business Intelligence flow Sourc e Analytics layer BI layer Reports Structured Source 1 2 4 Standard ETL Predictive 5 Textual EDW Unstructured Source 1 Data Categorizatio n 2 META DATA MANAGEMENT Legend Real time data Structured source systems Fig 2 Next Generation BI flow Pro-active cache Un structured systems Micro batch ETL / Change Data capture / Queue validations 7
QA validations for Predictive Analytics The early validation ensures that appropriate subjects have been identified. Validate if the Predictive scores and goals are computed accurately. This involves validation of formulae used by the models. Validate the effectiveness of the model by doing `back testing i.e. comparison of forecasts vs. actuals. The outcome of this testing would help the designers modify the predictive model. 8
QA validations for Textual Analytics Validate the metadata of the unstructured text Validate the mechanism of converting unstructured to structured data (tools, algorithms, business rules) Performance testing while handling voluminous data Ensure that `Blather is not keyed in as input to the analytics process. 9
QA validations for Change data capture mechanism (CDC) Validate the data captured in DW by comparing against the source for the updated/inserted data ensuring that the latest change is captured. Initial Data Modified Data Validate the Failover and Recoverability Fig 3 - Differential data capture 10
Key validations in traditional BI Source layer Analytics layer Schema checks ETL checks Data Quality checks Data Model validations Security validations Cube Validations Report layer Report layout validations Performance validations Data accuracy validations Fig 4 - Layer wise validations 11
Summary of the additional skillset needed BI Phase Source Layer Analytics Layer Reporting Layer Additional QA facet for Next Gen BI CDC on Demand (triggered by the business event) Recovery/Fail over Validation Predictive Analytics: Back testing for Predictive Models Predictive score validation Text Analytics: Meta data validation of unstructured data User experience validation: Validation of real time alerts and notifications Validate whether the data flow meets the SLA. Table 1 Additional QA scenarios for Next Gen BI 12
Role to be played by QA to validate Next Gen BI Need to transform from QA into DA i.e. from a Quality Analyst to a Data Analyst. Should append technical flavor to the current skillset. Should be able to put theoretical business domain knowledge into practice. Should be able to comprehend Statistical inferences of Predictive models. Knowledge on mathematical simulation models such as Monte-Carlo would become imperative for analysis. 13
Questions??? 14
Thank you 15