Datawarehouse testing using MiniDBs in IT Industry Narendra Parihar (nparihar@microsoft.com), Anandam Sarcar (asarcar@microsoft.

Similar documents
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

LEARNING SOLUTIONS website milner.com/learning phone

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implementing a SQL Data Warehouse 2016

Course Outline. Module 1: Introduction to Data Warehousing

Structure of the presentation

East Asia Network Sdn Bhd

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Implementing a Data Warehouse with Microsoft SQL Server 2014

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Administering Microsoft SQL Server 2012 Databases

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Microsoft Training and Certification Guide. Current as of December 31, 2013

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

What is the BI DBA? Jorge Segarra Sr. DBA Consultant, SQL Server MVP

Sai Phanindra. Summary. Experience. SQL Server, SQL DBA and MSBI SQL School saiphanindrait@gmail.com

Implementing a Data Warehouse with Microsoft SQL Server

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper

Implementing a Data Warehouse with Microsoft SQL Server 2012

BI on Cloud using SQL Server on IaaS

Implementing a Data Warehouse with Microsoft SQL Server 2012

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Microsoft SQL Business Intelligence Boot Camp

Client Requirement. Why SharePoint

SQL Server 2012 Business Intelligence Boot Camp

BI xpress Product Overview

Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Restoring Microsoft SQL Server 7 Master Databases

Combined Knowledge Business Intelligence with SharePoint 2013 and SQL 2012 Course

Implementing a Data Warehouse with Microsoft SQL Server 2012

For Sales Kathy Hall

Microsoft Training and Certification Guide. Current as of March 16, 2015

A Comprehensive Approach to Master Data Management Testing

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Designing Self-Service Business Intelligence and Big Data Solutions

Implementing a Data Warehouse with Microsoft SQL Server

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

Microsoft SQL Server Security & Auditing. March 23, 2011 ISACA Chapter Meeting

Testing Big data is one of the biggest

Creating BI solutions with BISM Tabular. Written By: Dan Clark

Best Practices for Implementing Autodesk Vault

Your Technology Partner Offshore and Onsite. Services Portfolio

About database backups

State of Louisiana Department of Revenue. Development/implementation of LDR s First Data Mart RFP Official Responses to Written Inquiries

Implementing a Data Warehouse with Microsoft SQL Server 2012

ETL-EXTRACT, TRANSFORM & LOAD TESTING

SQL Server 2012 End-to-End Business Intelligence Workshop

Microsoft SQL Server OLTP Best Practice

Building a BI Solution in the Cloud

How to restore a Microsoft SQL Server Master Database with Backup Exec for Windows Servers (Automate Master Database Restore)

Whitepaper: performance of SqlBulkCopy

Ten Things You Need to Know About Data Virtualization

QA Tools (QTP, QC/ALM), ETL Testing, Selenium, Mobile, Unix, SQL, SOAP UI

Microsoft SQL Database Administrator Certification

Microsoft Data Warehouse in Depth

Database Backup and Restore Mechanism. Presented by : Mary Meladath

Connectivity Pack for Microsoft Guide

SQL Server Administrator Introduction - 3 Days Objectives

InventoryControl for use with QuoteWerks Quick Start Guide

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Administering a Microsoft SQL Server 2000 Database

MSSQL quick start guide

SQL Server AlwaysOn. Michal Tinthofer 11. Praha What to avoid and how to optimize, deploy and operate.

SQL Server 2008 Core Skills. Gary Young 2011

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

What Is Specific in Load Testing?

PrivateWire Gateway Load Balancing and High Availability using Microsoft SQL Server Replication

SQL Server for Database Administrators Course Syllabus

Exam : Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2. Title : The safer, easier way to help you pass any IT exams.

Install and Configure SQL Server Database Software Interview Questions and Answers

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

SQL Server Integration Services with Oracle Database 10g

How to Configure and Use SQL with EnCase Products

General DBA Best Practices

2/6/2015. Proposed By:

JDE Data Warehousing and BI/Reporting with Microsoft PowerPivot at Clif Bar & Company Session ID#:

SQL Server Business Intelligence

TestManager Administration Guide

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Administering a Microsoft SQL Server 2000 Database

Sisense. Product Highlights.

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

Predicting Change Outcomes Leveraging SQL Server Profiler

Entity store. Microsoft Dynamics AX 2012 R3

HP OO 10.X - SiteScope Monitoring Templates

Transcription:

QAI's 5th International Colloquium on IT Service Management (ITSM 2010) Datawarehouse testing using MiniDBs in IT Industry Narendra Parihar (nparihar@microsoft.com), Anandam Sarcar (asarcar@microsoft.com) TABLE OF CONTENTS 1. ABSTRACT...2 3. ONE BOX ARCHITECTURE AND MINIDB ENVIRONMENT...4 ONE BOX ENVIRONMENT IS WHERE YOU PLUG IN ALL APPLICATION COMPONENTS LIKE UI, MINIDB S, JOBS, SSRS, SSIS, AND SSAS ON SINGLE SERVER SO THAT EXCEPT TOPOLOGY AND AMOUNT OF DATA IT MATCHES MOSTLY WITH YOUR PRODUCTION WORKING APPLICATION...4 4. DESCRIPTION OF THE PRACTICE...5 5. LEARNINGS AND BENEFITS...11 6. LIMITATIONS...12 7. WHERE DO WE GO FROM HERE?...12-1 -

1. Abstract In Business Intelligence(BI) applications there are Datawarehouse(DW) which holds data in Tera Bytes and sometimes go up to as many as 50-100 data sources. Testing such applications requires huge test servers and longer data testing cycles. The risk of source systems being down or in restore mode when ETL was running is not in team s control. To address all such issues and improve test effectiveness/productivity, Microsoft India BI Team came up with a organization wide concept of using MiniDB s across all BI projects. Not in DW team control In our control Source System 500 GB 10 hours 2 hours DW Staging 300 GB Datamart 400 GB 4 hours Cubes/BI Reports 100 GB Above diagram shows a typical DW architecture and storage needs, ETL Run time, Internal Job run times. MiniDB is basically a smaller data set version with exactly same schema/objects of source systems, application databases and a working model but with different topology than the production environment. Using the SQL server trimming mechanism we trim the databases to fit into a single box, and call it as One Box environment which is used by Dev/Test teams to do most of their functional, data, and ETL testing. The entire test strategy resulted in many advantages as reducing test cycles by 50%, more test coverage and shorter regression cycles. This paper will talk about how to create MiniDB s, replicate working model of a DW, test strategy to do BVT in less than 2 hours in a large data warehouse applications, and all practical challenges/benefits in the course of the implementing this technique in any data warehouse application. - 2 -

2. How it will help IT industry? Testing large DW applications have been always great challenge to test teams across world. Below are some of the challenges which you can related yourself with easily:- - ETL in my application takes about 10-15 hours every day, so I wait till ETL is complete do test, then run it again, then test, this game of test & wait is happening for years - We don t have enough hardware/budget to have exactly same topology as production environment in test - Backup/Restores takes half a day in each test drop - Smoke testing happens after a day because ETL is running - Source Systems are down so we can t run ETL and test Enough we guess!!! All testers are so familiar with above lines. Our practice of using MiniDBs solves all of the above problems. Main objectives of this practice which will help the IT Industry are listed below: - Reduce Source System dependency - Automation of data warehouse (business intelligence) testing to save 50% effort using a tried practice at Microsoft India Business Intelligence team using a Microsoft India BI Team concept called MiniDB. - Test Strategy which leads to prove productivity gains, saves about 70% hardware for the organization, and eliminates external source system dependency. - Reduce ETL Testing time by 80%, and useful for dev/test teams and to complete BVT in 2 hours in large DW applications - Save hardware budget - Invest more and more time in actual work of a tester, to ensure quality - No more test & wait game. - 3 -

3. One Box Architecture and MiniDb environment Microsoft India MiniDB Environment (below diagram) is where you scale down your source databases/actual application databases. One box environment is where you plug in all application components like UI, MiniDB s, Jobs, SSRS, SSIS, and SSAS on single server so that except topology and amount of data it matches mostly with your production working application. Source System MiniDB 0.25 GB 15 Mins 15 Mins DW Staging 0.5 GB Datamart 2 GB 30 Mins Cubes/BI Reports 1 GB One box environment is where you plug in all application components like UI, MiniDB s, Jobs, SSRS, SSIS, and SSAS on single server so that except topology and amount of data it matches mostly with your production working application. -Source MiniDB s Application UI Analysis Cubes -Application MiniDB s SQL Jobs Web Services Business Reports - 4 -

4. Description of the Practice Microsoft India Let us take an example of CSS (customer support service) Domain wherein customer s contacts support teams through different channels like Email, Phone, Chat, Surveys, feedback forms, community forums etc All those service requests (SR) are logged through a User Interface (a web or desktop based application) by support agents. This data gets into Source Systems databases (MSSQL as example here) in different regions say 5 regions. Our Datawarehouse pulls data from source systems based on some business logics and then there are some more transformations we do and finally data is ready for doing business reporting through Analysis services cubes (SSAS as example here) or Reporting Services (SSRS as example here). Below diagram represents it diagrammatically to understand better: Now let s say corresponding application architecture: - 5 -

Source Systems Staging Datamart So, now let s get on with how to go about creating MiniDB s for any DW applications: - 6 -

Let us detail out each process step in above diagram for more understanding. Firstly, we wanted to make Sources DB smaller and manageable with our own created data which would be richer in Quality of Data housed: One of the difficult challenges was that with limited amount of time in our hands and with so many Sources, we could not afford to create Test Data by understanding each and every Source SYSTEM. Instead, we focused on using a subset of Data for creation of Source Data by a generic approach and iteratively making it rich by populating on Data which we needed for Testing. Put all of your logics for mini dataset from each source into database Jobs, example code explained below (it s a MS SQL Server compliant code snippet) IF exists(select * from sys.databases where name = ''SourceOneDBName'') BEGIN ALTER DATABASE SourceOneDBName SET Single_user WITH ROLLBACK IMMEDIATE drop database SourceOneDBName END go Create database SourceOneDBName go use SourceOneDBName - 7 -

go Microsoft India select * into SRExampletable from ServerName.SourceOneDBName.dbo.SRExampletable WITH(NOLOCK) where <condition 1 > and <condition 1 > and... select * into SRUserExampletable from ServerName.SourceOneDBName.dbo.SRUserExampletable WITH(NOLOCK) where <condition 1 > and <condition 1 > and... You basically need to have same tables/schema while creating above like job steps so that when your Jobs run to pull data from source MiniDBs they don t fail. This way you can put in where conditions date range or other conditions so that you restrict data to 7 days or 14 days etc... And pull rest of the tables in relational ways after understanding the application code. End result of this process is you will have all source systems databases with probably 100 records in your server instance. Now you can change your Job pointing/configurations to pull data from these source MiniDB s rather than original source. You can even schedule these jobs with scheduled so that you have latest but smaller data set every day in source MiniDBs. Remember to take back up of these source DBs, which can be used to rebuild one box environment in less than 30 minutes. Iteration of the Process will help us in creating an Intelligent Selection of Test Data. This also always we in understanding our data better, creating test data before every release and we get closer & closer to data. Secondly, we wanted to make our DB smaller and leaner to address the ETL and Hardware Challenges. The approach number one for this as follows: Identify the Master and Transactional Tables in the Database Create the Schema of all the objects into Sample Databases from existing Test Database Populate the Master Data by setting the Full Pull Delta Data to 1-1-1753. Populate the Transactional Data by setting the Data for the Pull to the Date (used in set up for Source System creation described in previous section) While doing the analysis, we found some hardcoded values for certain tables, which we manually inserted into the Take a backup of all the Databases and restore it whenever the Test Environment needs to be refreshed End Result of this step should be a job which you can run anytime to create mini DB of application databases - 8 -

The approach number Two for this as follows: Restore latest production databases and backup on supposedly one box environment Now use the SQL statements to trim the tables in different application databases except the master tables Be careful in truncating data in your job, take care of the relationships etc. Take a backup of all the Databases and restore it whenever the Test Environment needs to be refreshed End Result of this step should be a job which you can run anytime to create mini DB of application databases. Thirdly, check all connectivity, do test connections from each database, and verify you have the right data; this might take quite a few iterations before you can really come to final job scripts. But it s worth the investments. Now once you hook up all other elements of your applications, you will have MiniDB environment on a single box to start testing ETL in few minutes, do some tests, and run ETL again and so on Trick: Don t forget to shrink databases after truncate statements in jobs. Your final One Box environment should look like below: - 9 -

Source MiniDBs Staging MiniDBs Datamart MiniDBs - 10 -

5. Learnings and Benefits Some of the learning s and benefits we take from this tried practice are below: - Practice has been used for more than a year now at Microsoft India Business Intelligence team by Dev/ Test and is wide spreading as a new revolution in Data warehouse & Business Intelligence Testing - Entire Functional and ETL Tests completed with very limited hardware. - Reduce BVT time to less than 2 hours in Datawarehouse/Business Intelligence projects - Reduced hours for regression cycles and ETL Testing - Entire ETL job run times have drastically improved from hours of job runs to minutes. - Test Environment Setup does not take more than 60 minutes and Functional Testing Process turnaround is very quick. This helps specifically when we have to release QFE s/production Issue Fixes - No external Dependency on External Sources Plus save $, INR - 11 -

6. Limitations Some of the limitations with MiniDBs are:- Understanding the data takes some time, there are practical challenges wherein you may need training on entire end to end technical setup of application If your ETL has historical and daily pull logics, you will need to trim data in such an manner that you can do functional tests in both situations Not all functional tests can be done with MiniDBs Performance testing is not recommended on MiniDBs environment There is some maintenance effort involved 7. Where do we go from here? It will be just great if you try out MiniDB & One Box environment approach to help your Dev/Test teams do their testing faster. We have been using it in our team for more than a year and find it saving upto 50% of test effort. You can contact authors at below email Ids Narendra Parihar: Nparihar@microsoft.com or comment on my Blog @ www.narendraparihar.co.nr Anandam Sarcar: Asarcar@microsoft.com - 12 -