Keywords: Data Warehouse, Data Warehouse testing, Lifecycle based testing, performance testing.



Similar documents
Keywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing

Data Warehouse Testing

Turkish Journal of Engineering, Science and Technology

Model based testing of Datawarehouse

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehousing and Data Mining in Business Applications

A Comprehensive Approach to Master Data Management Testing

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

Optimization of ETL Work Flow in Data Warehouse

ETL-EXTRACT, TRANSFORM & LOAD TESTING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Deriving Business Intelligence from Unstructured Data

An Approach for Facilating Knowledge Data Warehouse

A Survey on Web Mining From Web Server Log

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

A Knowledge Management Framework Using Business Intelligence Solutions

Comparative Analysis of Data warehouse Design Approaches from Security Perspectives

Advanced Data Management Technologies

Data Warehouse: Introduction

Proven Testing Techniques in Large Data Warehousing Projects

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Fluency With Information Technology CSE100/IMT100

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

Data Warehousing Systems: Foundations and Architectures

Metadata Management for Data Warehouse Projects

B.Sc (Computer Science) Database Management Systems UNIT-V

A Critical Review of Data Warehouse

Big Data-Challenges and Opportunities

MASTER DATA MANAGEMENT TEST ENABLER

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

IST722 Data Warehousing

An Overview of Knowledge Discovery Database and Data mining Techniques

SQL Server 2012 End-to-End Business Intelligence Workshop

BUILDING OLAP TOOLS OVER LARGE DATABASES

Fusion Applications Overview of Business Intelligence and Reporting components

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Integrated Data Mining and Knowledge Discovery Techniques in ERP

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

A Review of Data Mining Techniques

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

Dimensional Modeling for Data Warehouse

Enterprise Data Quality

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

Whitepaper. Data Warehouse/BI Testing Offering. Published on: January 2010 Author: Sena Periasamy

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Design of Electricity & Energy Review Dashboard Using Business Intelligence and Data Warehouse

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

MDM and Data Warehousing Complement Each Other

A Survey on Data Warehouse Architecture

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

QA Tools (QTP, QC/ALM), ETL Testing, Selenium, Mobile, Unix, SQL, SOAP UI

Understanding Data Warehousing. [by Alex Kriegel]

DATA WAREHOUSING AND OLAP TECHNOLOGY

How to Enhance Traditional BI Architecture to Leverage Big Data

Business Intelligence In SAP Environments

Data warehouse and Business Intelligence Collateral

Trustworthiness of Big Data

Introduction to Data Mining

Research on Airport Data Warehouse Architecture

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Establish and maintain Center of Excellence (CoE) around Data Architecture

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Warehousing and OLAP Technology for Knowledge Discovery

SQL Server 2005 Features Comparison

Testing Big data is one of the biggest

Microsoft Data Warehouse in Depth

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Outline Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE CONCEPTUAL DESIGN

Implementing a Data Warehouse with Microsoft SQL Server 2012

Distance Learning and Examining Systems

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Mining for Successful Healthcare Organizations

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

SimCorp Solution Guide

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e.

Business Intelligence in E-Learning

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Data Warehouse Testing An Exploratory Study

Student Performance Analytics using Data Warehouse in E-Governance System

An Overview of Database management System, Data warehousing and Data Mining

The University of Jordan

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

A Survey of ETL Tools

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Webinar. Feb

A Data Warehouse Design for A Typical University Information System

A Survey of Real-Time Data Warehouse and ETL

Transcription:

DOI 10.4010/2016.493 ISSN2321 3361 2015 IJESC Research Article December 2015 Issue Performance Testing Of Data Warehouse Lifecycle Surekha.M 1, Dr. Sanjay Srivastava 2, Dr. Vineeta Khemchandani 3 IV Sem, MTech 1, Professor & HOD 2, Associate professor 3 Department of CSE 1, 2 MGM College of Engineering, Noida 1, 2 JSSATE, Noida Uttar Pradesh Technical University, India 3 surekhashivakumar@gmail.com 1, sanjaysrivastava@coet.in 2, vkhemchandani@jssaten.ac.in 3 Abstract: Testing is an essential part of design lifecycle of any software product. Testing of Data warehouse is very important in projects because users need to trust the quality of information they access. The reasons for this are increase in Enterprise Mergers & Acquisitions, Compliance Regulations, Increased focus on data by Management and data driven decision makings. Data warehouse is a collection of large amount of data which is used by management for making strategic decisions. In this paper, we introduce a performance testing of Data warehouse Lifecycle and implement the same for a dataset using Naïve Bayes and K-Nearest Neighbour algorithm and made comparison between these two discuss how it can help solve some of the challenges in data warehouse This paper also proposes implementation details and future roadmap for model based data warehouse Keywords: Data Warehouse, Data Warehouse testing, Lifecycle based testing, performance I. Introduction to Data Warehouse In Computing Data warehouse is a system used for Reporting and data analysis. DWs are central repositories of integrated data from one or more desperate sources. They store historical and current data are used for creating analytical reports for knowledge workers throughout the enterprise. It is also called enterprise data Ware house (EDW). A Data warehouse is a Subject-oriented, time variant, integrated and nonvolatile collection of data in support of management s decision making process. A data warehouse is a copy of transaction data specifically structured for analysis and query. Data warehouse testing is carried out to check the quality of data stored. Testing should focus on Transformation of data, quality and performance of data and completeness of data. This paper is organized as follows section 2 briefly explains the survey of different DWT techniques. Section 3 explains proposed work. Section 4 explains advantages of lifecycle. Section 5 represents results and performance evaluation. Section 6 contains conclusion Data Warehouse Testing DWT itself has multiple phases and is staggered throughout the lifecycle of DW implementation. DWT is focuses on quality, consistency, completeness and validation of data. Testing implementation undergoes the cycle of unit testing Integration testing, System testing and usability testing and also focus on test cases generation requirement gathering and for the system. Data Warehouse Testing VS Software Testing DWT has a broader scope than software testing as it focus upon the usefulness and correctness of data. DWT always deals with huge volume of Data. DWT is post placement activity where as software testing is prior placement activity of software DWT test cases are unlimited. Software testing test cases are limited. DWT is System triggered whereas software testing is user triggered. DWT focus on validating the data whereas software testing focuses on the code. Challenges in Data Warehouse Testing As it is accepted that DW is different from other System, Hence there are some challenges in DW II. Different types of formatted data. Data of Heterogeneous sources. Huge volume of Data. Lack of testing tools knowledge Lack of requirements clarity Missing values in large volume of data One of the core challenges of testing DWS or providing technique for testing DW is its flexible architecture DW system could have different architectures. RELATED WORK In [1] Naveen ElGamal proposed to show the workflow of the framework when it is put in to operation. The key player of the DW testing process is the Test manager, who feeds the system with the DW architecture under test and current state of DW.DW architecture Analyzer component then studies the received data and compare it with the dependency graph with the assistance of test dependency manager component and passes the data to test recommender to generate an 2022 http://ijesc.org/

abstract test plan, then executed validation and verification phase. Validation manager involves the business experts and system experts where as verification manager involves system tester and DB administrator for assistance in process of Test Case preparation. In[2] Kuldeep Deshpande proposed model based testing is a technique for automatic generation and execution of test cases based on formal models of system under test(sut).model based testing can be applied at various levels of testing i.e. unit testing, integration test and system In [3] Syntel white paper proposed object oriented framework for DW conceptual design. Framework is divided in to two levels namely-requirement level and Design level. At the requirement level, the requirements are gathered from different users and a thorough analysis is made. In design level UML designs helps in Extracting major objects and classes from data gathered and construction of UML class diagrams. In [4] Rajini Jindal and Shweta Taneja proved testing technique in large DW projects. Paper suggests that Industry best practices in DWH testing are data completeness and quality check, BI report data tasting, Performance validation of ETL and reports. Critical success factors for testing are Referential integrity of facts and Dimensions, Risk based Testing, data abfuscation, effective defect management and Focus on automation. In [5] Naveen ElGamal, Ali Ei Bastawissy and Galal Edeen proposed DW matrices which is categorized as what, where and when These categories will result in a three dimensional matrix. As shown in table1, the rows represent the where dimension, the columns represent the what dimension, and later on the when dimension is represented in color. Each cell of this table will consist of the group of test routines that addresses combination of dimension members.this matrix is used to compare the existing DW testing approaches. In [7] Execution-MiH explained categories of DW testing which includes different stages of the process. The testing is done on individual and end to end basis. Testing will include extraction testing transformation testing loading testing, end user testing end user browsing and OLAP testing, Adhoc Query testing, Stress and volume testing and parallel In [8] Golfarelli M. and Rizzi S introduced data warehouse testing activities framed within a DW development methodology. They stated that, The components needs to be tested: conceptual schema, Logical Schema, ETL procedures, Database and frontend. Table2: DW components vs testing types[8] III.PROPOSED WORK The testing activities in data warehousing projects begin with the requirement gathering phase and carried out in an iterative manner. In data warehousing testing, every component of the project needs to be tested. Figure 1 shows the proposed lifecycle for data warehouse testing containing following phases. Table 1: DW Testing Matrices In [6] Manoj Philip Mathen proposed two focus points for DW At a high level, Any strategy should focus on the two main aspects mentioned as underlying data and DW components. Underlying data includes Data coverage and data complying with the Transformation logic in accordance with the Business Rules. Whereas in DW Components he mentioned: performance, scalability, component orchestration tests and regression Figure1: Proposed Lifecycle of Data Warehouse Testing 2023 http://ijesc.org/

We are briefly described the above phases as follows: 1. DATA GATHERING PHASE : All the data is collected from different sources like SQL server, access sheets. In this paper we used Data from Banking Domain as shown in figure1 Reviewed data. 2. REVIEW The data collected is reviewed for any correction present in the data. All data which is to be tested should be included in the data gathering. data if parameters are age, sex, income etc in Naive Bayes classifier considers each of these features contribute independently to the probability of data being in an existing data, regardless of the presence or absence of the other features. K-Nearest Neighbor: K nearest neighbors is a simple algorithm that stores all available cases and Classifies new cases based on a similarity measure (e.g., distance functions). Algorithm A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor. It should also be noted that all three distance measures are only valid for Continuous variables. In the instance of Categorical variables the Hamming Distance must be used. Figure 1: Reviewed Data 3. PLANNING In this phase, test data is prepared for In this phase how testing is to be carried out is planned. Decision for selecting of which algorithm is made here. In this paper selected testing algorithms are Naive bays and K-nearest neighbor. 4. REQUIREMENT TESTING In this emphasis is given defining business rules and requirement stated should be complete, clear, consistent and understandable. Interface review must be done to check the usability of the system. Tools must be decided in this phase on which testing must be performed. 5. TEST CASE GENERATION Different test cases are generated by using different combinations and according tools are set to operate. 6. TESTING Different testing techniques are mentioned: Testing phase involves selection of algorithm to be implemented to test DW performance. Selected algorithm in our case are Naive Bayes and KNN (K-Nearest Neighbor) Naïve Bayes: Briefly explaining Naïve Bayes classifier, In simple terms, a Naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, In banking It also brings up the issue of standardization of the numerical variables between 0 and 1 when there is a mixture of numerical and Categorical variables in the dataset. Unit Testing: In this white box testing is performed. The developer loads the data from data source, data modules is tested individually. Integration testing: the process of combining and testing the components together one by one to check their integrity issues and to make sure they perform well working together. System Testing: this testing executes at developer site to make sure the system performs well. All the components run well together. Usability testing: this is a black box This checks for fulfillment of the requirements and ensure the validation of data. 7. PERFORMANCE TEST The proposed lifecycle executed well in implementation as we are getting validated data as a result of querying the data through.net interface. In WEKA data is validated. Then unit testing is done on single units. Then proceed to integration testing by integrating WEKA, SQL SERVER and UI together to check whether following components worked well. Data is tested against the accuracy of the data sets and performance is judged on the basis of different data mining attributes like precision, accuracy and time taken. 2024 http://ijesc.org/

IV. FEATURES OF PROPOSED TESTING LIFECYCLE FOR DATA WAREHOUSE 1. Test planning is done prior to the test cases development. 2. Stress on Requirement testing is given as it is important for the need of development of the system and tools is studied for testing the data. 3. Strategy is prepared as all the test data should be covered in test cases so that defects can be recognized at the early stages. 4. Testing the data on the basis of accuracy is the major part as in data mining there are many algorithms to choose, the best algorithm according to the need is chosen. 5. Testing is made iterative to make it easy for testers to change the system according to the requirements or add on any strategy without any hindrance. V. IMPLEMENTATION RESULTS Testing of data is done using the proposed lifecycle. Firstly, all data is collected and extracted by using ETL (Extracted Tested Loaded) process. All the data is transformed according to the business rules, which is then loaded into the WEKA tool and algorithm is loaded and results are captured. Data is validated using this process. Cost/benefit and threshold curve algorithm used for data analysis shows good result of graph. Secondly, GUI (Graphical user Interface) is made in which data sets is loaded and queried and result is displayed so as to ensure whether the validated data is producing correct results. Thirdly Comparison is made using proposed algorithms Figure 4: Threshold curve for KNN Figure 5: Threshold curve for KNN Figure 6: Threshold curve for KNN Figure 2: threshold graph for naive bayes VI. CONCLUSION From our work, we have concluded that. Any bug found in later stages can affect the analysis of data. Hence it become very important to validate before presenting it to GUI, It can help the data analyst to do analysis at accurate rate and in simplified manner. This paper presents the iterative DWT life cycle. The paper shows 94.2% accuracy achieved with KNN algorithm in comparison with Naïve Bayes, which shows 89.5% accuracy which concludes that KNN gives better performance than Naïve Bayes. VII. FUTURE WORK ROADMAP In future, we plan to conduct the studies on various data gathering techniques and would work on data gathering and will focus more on building a data warehouse with more improved testing strategies. Figure 3: cost Benefit graph for Naïve Bayes ACKNOWLEGMENT I thanks to all the experts who have contributed towards this work. I would also like to dedicate my acknowledgment of gratitude towards the following significant advisors and contributors: 2025 http://ijesc.org/

I would like to thank Mr. Sanjay Srivastava for reading my research paper and providing valuable advices and for reproofing the paper. I sincerely thank to my parents, family, and friends, who provide the advice and financial support. The product of this research paper would not be possible without all of them. REFERENCES [1].Naveen ElGamal, Data Warehouse Testing, EDBT/ICDT, March 2013 [2].Kuldeep Deshpande, Model Based testing of Data warehouse, International Journal of Computer Science Issues (IJCSI), vol 10, Issue 2, March 2013 [3].Syntel, Proven Testing Techniques In Large Data Warehousing Projects : A white paper, Syntel 2012 [4].Rajini Jindal and Shweta Taneja, Comparitive Study Of Data Warehouse Design Approaches: A survey, International Jornal of Database Management System (IJDMS), vol February 2012 [5].Naveen ElGamal, Ali Ei Bastawissy and Galal Edeen, Towards A Data warehouse esting Framework, Ninth International Conference On ICT ad Knowledge Engineering, IEEE 2011 [13].Mookerjea A. and Malisetty P., 2008, Data Warehouse/ETL Testing: Best Practices, www.pureconferences.com. [14].Weka tutorials: http://sentimentmining.net/weka/ [15].www.wikipedia.com [16].Sneed M. Harry, 2006, Testing a Data Warehouse an Industrial Challenge, in proceedings of the Testing: Academic & Industrial Conference on Practice and Research Techniques, IEEE Computer, p. 203-210 Author Details Mrs. Surekha.M. MTech, Dept of CSE, MGM College of Engineering, Noida. Uttar Pradesh Technical University. Dr. Sanjay Srivastava Professor & HOD, Dept of CSE., MGM College of Engineering, Noida. Uttar Pradesh Technical University. [6].Manoj Philip Mathen, Data Warehouse Testing : a white paper, Infosys, March 2010 [7].Execution-MiH, Data Warehouse Testing is different. [8].Golfarelli M. and Rizzi S., 2009, A Comprehensive Approach to Data Warehouse Testing, in ACM 12th international workshop on Data Warehousing and OLAP (DOLAP 09), Hong Kong, China. Dr.Vineeta Khemchandani Assossiate professor, JSSATE,Noida Uttar Pradesh Technical University. [9] Muhammad Shahan Ali Khan and Ahmad ElMadi, Data Warehouse Testing an Exploratory Study, MS Thesis, School of Computing, Blekinge Institute of Technology, Karlskrona, Sweden, 2011. [10] Muhammad Shafique and Yvan Labiche, A Systematic Review of Model Based Testing Tool Support, Carleton University, Technical Report, 2010. [11] Tanuška, P., Moravčík, O., Važan, P. and Miksa, F.The Proposal of the Essential Strategies of Data Warehouse Testing. in 19th Central European Conference on Information and Intelligent Systems (CECIIS), (2008), 63-67. [12].Pooniah, P., 2001, Data Warehousing Fundamentals A Comprehensive Guide for IT Professionals, John Wiley & Sons, Inc 2026 http://ijesc.org/