QAI's 5th International Colloquium on IT Service Management (ITSM 2010) Datawarehouse testing using MiniDBs in IT Industry Narendra Parihar (nparihar@microsoft.com), Anandam Sarcar (asarcar@microsoft.com) TABLE OF CONTENTS 1. ABSTRACT...2 3. ONE BOX ARCHITECTURE AND MINIDB ENVIRONMENT...4 ONE BOX ENVIRONMENT IS WHERE YOU PLUG IN ALL APPLICATION COMPONENTS LIKE UI, MINIDB S, JOBS, SSRS, SSIS, AND SSAS ON SINGLE SERVER SO THAT EXCEPT TOPOLOGY AND AMOUNT OF DATA IT MATCHES MOSTLY WITH YOUR PRODUCTION WORKING APPLICATION...4 4. DESCRIPTION OF THE PRACTICE...5 5. LEARNINGS AND BENEFITS...11 6. LIMITATIONS...12 7. WHERE DO WE GO FROM HERE?...12-1 -
1. Abstract In Business Intelligence(BI) applications there are Datawarehouse(DW) which holds data in Tera Bytes and sometimes go up to as many as 50-100 data sources. Testing such applications requires huge test servers and longer data testing cycles. The risk of source systems being down or in restore mode when ETL was running is not in team s control. To address all such issues and improve test effectiveness/productivity, Microsoft India BI Team came up with a organization wide concept of using MiniDB s across all BI projects. Not in DW team control In our control Source System 500 GB 10 hours 2 hours DW Staging 300 GB Datamart 400 GB 4 hours Cubes/BI Reports 100 GB Above diagram shows a typical DW architecture and storage needs, ETL Run time, Internal Job run times. MiniDB is basically a smaller data set version with exactly same schema/objects of source systems, application databases and a working model but with different topology than the production environment. Using the SQL server trimming mechanism we trim the databases to fit into a single box, and call it as One Box environment which is used by Dev/Test teams to do most of their functional, data, and ETL testing. The entire test strategy resulted in many advantages as reducing test cycles by 50%, more test coverage and shorter regression cycles. This paper will talk about how to create MiniDB s, replicate working model of a DW, test strategy to do BVT in less than 2 hours in a large data warehouse applications, and all practical challenges/benefits in the course of the implementing this technique in any data warehouse application. - 2 -
2. How it will help IT industry? Testing large DW applications have been always great challenge to test teams across world. Below are some of the challenges which you can related yourself with easily:- - ETL in my application takes about 10-15 hours every day, so I wait till ETL is complete do test, then run it again, then test, this game of test & wait is happening for years - We don t have enough hardware/budget to have exactly same topology as production environment in test - Backup/Restores takes half a day in each test drop - Smoke testing happens after a day because ETL is running - Source Systems are down so we can t run ETL and test Enough we guess!!! All testers are so familiar with above lines. Our practice of using MiniDBs solves all of the above problems. Main objectives of this practice which will help the IT Industry are listed below: - Reduce Source System dependency - Automation of data warehouse (business intelligence) testing to save 50% effort using a tried practice at Microsoft India Business Intelligence team using a Microsoft India BI Team concept called MiniDB. - Test Strategy which leads to prove productivity gains, saves about 70% hardware for the organization, and eliminates external source system dependency. - Reduce ETL Testing time by 80%, and useful for dev/test teams and to complete BVT in 2 hours in large DW applications - Save hardware budget - Invest more and more time in actual work of a tester, to ensure quality - No more test & wait game. - 3 -
3. One Box Architecture and MiniDb environment Microsoft India MiniDB Environment (below diagram) is where you scale down your source databases/actual application databases. One box environment is where you plug in all application components like UI, MiniDB s, Jobs, SSRS, SSIS, and SSAS on single server so that except topology and amount of data it matches mostly with your production working application. Source System MiniDB 0.25 GB 15 Mins 15 Mins DW Staging 0.5 GB Datamart 2 GB 30 Mins Cubes/BI Reports 1 GB One box environment is where you plug in all application components like UI, MiniDB s, Jobs, SSRS, SSIS, and SSAS on single server so that except topology and amount of data it matches mostly with your production working application. -Source MiniDB s Application UI Analysis Cubes -Application MiniDB s SQL Jobs Web Services Business Reports - 4 -
4. Description of the Practice Microsoft India Let us take an example of CSS (customer support service) Domain wherein customer s contacts support teams through different channels like Email, Phone, Chat, Surveys, feedback forms, community forums etc All those service requests (SR) are logged through a User Interface (a web or desktop based application) by support agents. This data gets into Source Systems databases (MSSQL as example here) in different regions say 5 regions. Our Datawarehouse pulls data from source systems based on some business logics and then there are some more transformations we do and finally data is ready for doing business reporting through Analysis services cubes (SSAS as example here) or Reporting Services (SSRS as example here). Below diagram represents it diagrammatically to understand better: Now let s say corresponding application architecture: - 5 -
Source Systems Staging Datamart So, now let s get on with how to go about creating MiniDB s for any DW applications: - 6 -
Let us detail out each process step in above diagram for more understanding. Firstly, we wanted to make Sources DB smaller and manageable with our own created data which would be richer in Quality of Data housed: One of the difficult challenges was that with limited amount of time in our hands and with so many Sources, we could not afford to create Test Data by understanding each and every Source SYSTEM. Instead, we focused on using a subset of Data for creation of Source Data by a generic approach and iteratively making it rich by populating on Data which we needed for Testing. Put all of your logics for mini dataset from each source into database Jobs, example code explained below (it s a MS SQL Server compliant code snippet) IF exists(select * from sys.databases where name = ''SourceOneDBName'') BEGIN ALTER DATABASE SourceOneDBName SET Single_user WITH ROLLBACK IMMEDIATE drop database SourceOneDBName END go Create database SourceOneDBName go use SourceOneDBName - 7 -
go Microsoft India select * into SRExampletable from ServerName.SourceOneDBName.dbo.SRExampletable WITH(NOLOCK) where <condition 1 > and <condition 1 > and... select * into SRUserExampletable from ServerName.SourceOneDBName.dbo.SRUserExampletable WITH(NOLOCK) where <condition 1 > and <condition 1 > and... You basically need to have same tables/schema while creating above like job steps so that when your Jobs run to pull data from source MiniDBs they don t fail. This way you can put in where conditions date range or other conditions so that you restrict data to 7 days or 14 days etc... And pull rest of the tables in relational ways after understanding the application code. End result of this process is you will have all source systems databases with probably 100 records in your server instance. Now you can change your Job pointing/configurations to pull data from these source MiniDB s rather than original source. You can even schedule these jobs with scheduled so that you have latest but smaller data set every day in source MiniDBs. Remember to take back up of these source DBs, which can be used to rebuild one box environment in less than 30 minutes. Iteration of the Process will help us in creating an Intelligent Selection of Test Data. This also always we in understanding our data better, creating test data before every release and we get closer & closer to data. Secondly, we wanted to make our DB smaller and leaner to address the ETL and Hardware Challenges. The approach number one for this as follows: Identify the Master and Transactional Tables in the Database Create the Schema of all the objects into Sample Databases from existing Test Database Populate the Master Data by setting the Full Pull Delta Data to 1-1-1753. Populate the Transactional Data by setting the Data for the Pull to the Date (used in set up for Source System creation described in previous section) While doing the analysis, we found some hardcoded values for certain tables, which we manually inserted into the Take a backup of all the Databases and restore it whenever the Test Environment needs to be refreshed End Result of this step should be a job which you can run anytime to create mini DB of application databases - 8 -
The approach number Two for this as follows: Restore latest production databases and backup on supposedly one box environment Now use the SQL statements to trim the tables in different application databases except the master tables Be careful in truncating data in your job, take care of the relationships etc. Take a backup of all the Databases and restore it whenever the Test Environment needs to be refreshed End Result of this step should be a job which you can run anytime to create mini DB of application databases. Thirdly, check all connectivity, do test connections from each database, and verify you have the right data; this might take quite a few iterations before you can really come to final job scripts. But it s worth the investments. Now once you hook up all other elements of your applications, you will have MiniDB environment on a single box to start testing ETL in few minutes, do some tests, and run ETL again and so on Trick: Don t forget to shrink databases after truncate statements in jobs. Your final One Box environment should look like below: - 9 -
Source MiniDBs Staging MiniDBs Datamart MiniDBs - 10 -
5. Learnings and Benefits Some of the learning s and benefits we take from this tried practice are below: - Practice has been used for more than a year now at Microsoft India Business Intelligence team by Dev/ Test and is wide spreading as a new revolution in Data warehouse & Business Intelligence Testing - Entire Functional and ETL Tests completed with very limited hardware. - Reduce BVT time to less than 2 hours in Datawarehouse/Business Intelligence projects - Reduced hours for regression cycles and ETL Testing - Entire ETL job run times have drastically improved from hours of job runs to minutes. - Test Environment Setup does not take more than 60 minutes and Functional Testing Process turnaround is very quick. This helps specifically when we have to release QFE s/production Issue Fixes - No external Dependency on External Sources Plus save $, INR - 11 -
6. Limitations Some of the limitations with MiniDBs are:- Understanding the data takes some time, there are practical challenges wherein you may need training on entire end to end technical setup of application If your ETL has historical and daily pull logics, you will need to trim data in such an manner that you can do functional tests in both situations Not all functional tests can be done with MiniDBs Performance testing is not recommended on MiniDBs environment There is some maintenance effort involved 7. Where do we go from here? It will be just great if you try out MiniDB & One Box environment approach to help your Dev/Test teams do their testing faster. We have been using it in our team for more than a year and find it saving upto 50% of test effort. You can contact authors at below email Ids Narendra Parihar: Nparihar@microsoft.com or comment on my Blog @ www.narendraparihar.co.nr Anandam Sarcar: Asarcar@microsoft.com - 12 -