Automated ETL Testing with Py.Test. Kevin A. Smith Senior QA Automation Engineer Cambia Health Solutions

Similar documents

DiskBoss. File & Disk Manager. Version 2.0. Dec Flexense Ltd. info@flexense.com. File Integrity Monitor

Sisense. Product Highlights.

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

HIPAA Compliance Use Case

Replicating to everything

1.264 Lecture 15. SQL transactions, security, indexes

Managing Third Party Databases and Building Your Data Warehouse

Oracle Data Integrator 12c: Integration and Administration

SAP Data Services 4.X. An Enterprise Information management Solution

BarTender Integration Methods. Integrating BarTender s Printing and Design Functionality with Your Custom Application WHITE PAPER

Using Oracle Data Integrator with Essbase, Planning and the Rest of the Oracle EPM Products

FDQM Financial Data Quality Management Fundamentals - Tips & Tricks Gary Womack, May 8th, 2013

Oracle Essbase Integration Services. Readme. Release

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

Dream Report vs MS SQL Reporting. 10 Key Advantages for Dream Report

Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER

Audit TM. The Security Auditing Component of. Out-of-the-Box

Capturing & Processing Incoming s

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Oracle Warehouse Builder 10g

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

WhiteWave's Integrated Managed File Transfer (MFT)

How, What, and Where of Data Warehouses for MySQL

Application Testing Suite: A fully Java-based software testing platform for testing Oracle E-Business Suite and other web applications

Chapter 24: Creating Reports and Extracting Data

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Getting Started with STATISTICA Enterprise Programming

Oracle Data Integrator 11g: Integration and Administration

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

MicroStrategy Course Catalog

Oracle Database 12c Enables Quad Graphics to Quickly Migrate from Sybase to Oracle Exadata

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Data Relationship Management Interface

DiskPulse DISK CHANGE MONITOR

Automating System Administration with Perl

MultiAlign Software. Windows GUI. Console Application. MultiAlign Software Website. Test Data

Cúram Business Intelligence Reporting Developer Guide

ITG Software Engineering

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise

Populating Your Domino Directory (Or ANY Domino Database) With Tivoli Directory Integrator. Marie Scott Thomas Duffbert Duff

MobiLink Synchronization with Microsoft SQL Server and Adaptive Server Anywhere in 30 Minutes

OpenMake Dynamic DevOps Suite 7.5 Road Map. Feature review for Mojo, Meister, CloudBuilder and Deploy+

Developing Value from Oracle s Audit Vault For Auditors and IT Security Professionals

Jet Data Manager 2012 User Guide

Relational Databases for the Business Analyst

news from Tom Bacon about Monday's lecture

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

SysPatrol - Server Security Monitor

INTRODUCTION: SQL SERVER ACCESS / LOGIN ACCOUNT INFO:

SQL Simple Queries. Chapter 3.1 V3.0. Napier University Dr Gordon Russell

Integrating VoltDB with Hadoop

Entity store. Microsoft Dynamics AX 2012 R3

Monitoring System Status

Setting up SQL Translation Framework OBE for Database 12cR1

Oracle Database 11g SQL

Internal Control Deliverables. For. System Development Projects

J j enterpririse. Oracle Application Express 3. Develop Native Oracle database-centric web applications quickly and easily with Oracle APEX

Integrating Biometrics into the Database and Application Server Infrastructure. Shirley Ann Stern Principal Product Manager Oracle Corporation

IBM WebSphere DataStage Online training from Yes-M Systems

2.3 - Installing the moveon management module - SQL version

SelectSurvey.NET User Manual

Chapter 13. Introduction to SQL Programming Techniques. Database Programming: Techniques and Issues. SQL Programming. Database applications

- 1 - Guidance for the use of the WEB-tool for UWWTD reporting

Active Directory 2008 Audit Management Pack Guide for Operations Manager 2007 and Essentials 2010

Copying data from SQL Server database to an Oracle Schema. White Paper

Business Intelligence Tutorial

High-Volume Data Warehousing in Centerprise. Product Datasheet

Geodatabase Programming with SQL

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Oracle Banking Platform

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)

Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2. Last revised September 26, 2014

What s New in Centrify Server Suite 2015

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

DataDirect XQuery Technical Overview

Microsoft SQL Server Features that can be used with the IBM i

Survey of Unit-Testing Frameworks. by John Szakmeister and Tim Woods

Software Engineering. Data Capture. Copyright BCA Notes All Rights Reserved.

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

Active Directory Integration for Greentree

Configuration Manager Error Messages

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Embarcadero DB Change Manager 6.0 and DB Change Manager XE2

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

Policy Compliance. Getting Started Guide. January 22, 2016

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Microsoft Windows PowerShell v2 For Administrators

PATROL From a Database Administrator s Perspective

Green Migration from Oracle

Bank Reconciliation need not be back breaking anymore!!!

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Evaluation Checklist Data Warehouse Automation

White Paper BMC Remedy Action Request System Security

MD Link Integration MDI Solutions Limited

P-Synch by M-Tech Information Technology, Inc. ID-Synch by M-Tech Information Technology, Inc.

Using Microsoft SQL Server A Brief Help Sheet for CMPT 354

Test Automation Integration with Test Management QAComplete

Transcription:

Automated ETL Testing with Py.Test Kevin A. Smith Senior QA Automation Engineer Cambia Health Solutions

2 Agenda Overview Testing Data Quality Design for Automation & Testability Python and Py.Test Examples

3 Database Applications Taking data out of an OnLine Transaction Processing (OLTP) system and putting into an OnLine Analytical Processing (OLAP) system involves Extracting the data, Transforming the data and then Loading that data into another database (ETL) When Testing an ETL Application: Extract Transform Compare

Cambia EVR Application 4

5 Testing Data Quality Data Completeness Ensures that all expected data is loaded. Data Integrity Ensures that the ETL application rejects, substitutes default values or corrects and reports invalid data. Data Transformation Ensures that all data is transformed correctly according to business rules and/or design specifications.

6 Testing Techniques Stare & Compare Validate data transformations manually. This step is usually required to bootstrap an ETL test automation project

7 Testing Techniques Golden Files Use well-known test data and golden file comparison as a testing oracle. This technique is very powerful for automated testing of printed output.

8 Testing Techniques Self-Verifying Oracle in Test Scripts model. Necessary if you want to test any aspects of the ETL application running in production.

9 Design for Automation Control How well the application can be controlled from the test tools. Visibility How well are intermediate data and results visible to the test tools.

Design for Testability - Visibility 10

11 Test Tools - Rules of Thumb 1. Do not re-invent the wheel. 2. No test tool will do everything you need - customize 3. No one test tool will solve all of your test problems tool box. 4. Do not expect your business experts or developers to be able to create great tests, even with tools. 5. Do not use one-off technology for testing. 6. Do not use the built-in test module to your ETL development tool.

12 Tool Requirements Support Customization Support Source to Target Data Mapping Support Complex Logical Calculations Support database connections Support CSV and XML Existing Tool Customizable Leverage Existing Knowledge Multi-OS (AIX, Windows)

13 Python and Py.Test Support Oracle and Sybase databases with 3 rd party libraries: PyODBC, cx_oracle Native support for CSV files and XML Strong support for containers (Tuple, List, Dict) Easy learning curve for non-programmers

14 What is Py.Test? Searches Disk for Tests Sequences and Executes Tests Captures Output Captures Exceptions Reports Results Interfaces to Extend/Customize Behavior Command Line Processing Test Search/Sequencing/Selecting Test Handling (Fixtures) Reporting

15 Database Support conn = cx_oracle.connection(user_name, password, server_name) crsr = conn.cursor() query_string = <<<embedded sql statement>>> crsr.execute(query_string) for row in crsr.fetchall(): key = str(row[0]) + _ + str(row[1]) results[key] = { source : row, target : ( Missing,)}

16 CSV File Support import csv csv_data = csv.reader(open( data.csv, for row in csv_data(): newline= ), delimiter= ) key = str(row[0]) + _ + str(row[1]) results[key] = { source : row, target : ( Missing,)}

17 Row Comparison for value in results.values(): assert value[ source ] == value[ target ]

18 Test Patterns Database Schema Row Counting Simple Source to Target Mapping Complex Source to Target Mapping

19 Database Schema table_names = ('OUTPUT_CD_TRNSLTN', 'OUTPUT_DRAG_DT', 'OUTPUT_NTWK', 'OUTPUT_PH_NUM') def test_dev_schema(): """ Test the development database. """ schemas = [] crsr = Database.get_cursor('DEV') for table in table_names: schemas.append(get_table_dict(crsr, 'dev', table, out_dir, base_dir)) crsr.close() generic_schema_compare(schemas, 'Development')

20 Database Schema (cont d) def generic_schema_compare(results, title): """ Generic table comparison test. """ test_rslt = True for schema in results: if schema[ source']!= schema[ target']: schema[ source'].show_diffs(schema[ target']) test_rslt = False assert test_rslt, title + ' schema differences'

21 Row Counting crsr.execute(""" Select count(*) From FEP_PMT.FEP_CLM Where FDS_BAT_ID = :arg_1 and DISP_CD in ('1','2','9') and AMT_PAID < 0""", arg_1 = fds_bat_id) for row in crsr.fetchall(): pass actual = row[0] assert actual == 0, 'Negative claims found, invalid incoming data'

22 Complex Source to Target for key, val in get_claim_lines.items(): expected_contract_adj_amt = 0 # calculate the expected contractual adjustment amount # walk the fields by field name for i in range(1,6): # calculate the base name of this hag "row" hag_base_name = 'HAG'+ str(i) + '_ADJ_' if [hag_base_name + 'CDE'] == 'CO': if(val[hag_base_name + 'RSN1']!= '23' and val[hag_base_name + 'RSN1']!= '171'): expected_contract_adj_amt += val[hag_base_name + 'AMT1'] if(val[hag_base_name + 'RSN2']!= '23' and val[hag_base_name + 'RSN2']!= '171'): expected_contract_adj_amt += val[hag_base_name + 'AMT2'] # now compare the calculation to the amount retrieved from the table if round(val['cntrctl_adjstmt_amt'], 4)!= round(expected_contract_adj_amt, 4) : print('claim_trans_disp_line: ' + key + ' did not calculate correctly.') print('actual:', round(val['cntrctl_adjstmt_amt'], 4), 'Expected:', round(expected_contract_adj_amt, 4)) print() test_result = False assert test_result, 'Incorrect contractual adjustment calculations'

23 In-memory Data Representation key = str(row[0]) + _ + str(row[1]) results = {} # create a dict to hold in-memory # tables of source and target data results = {key 1 : { source : row, target : row}, key 2 : { source : row, target : row, source 1 : row, source 2 : row}, key 3 : { source : row, target : ( Missing,)}, key 4 : { source : ( Added,), target : row}}

Customized Test Output 24

25 Customizations Shared Database Connection Pool Database connection parameters, including obfuscated login information INI-file Processing File directories for XML, CSV, baseline and output logging files. Default values for command line options, such as logical database name mapping Command Line Option Processing Batch ID Database Names Standard Test Routines Source to Target Mapping Database Schema Testing

26 Team James Bass UTi William Buse Cambia Health Solutions Matthew Pierce Cambia Health Solutions Venkatesh Marada Cambia Health Solutions Kanthi Kondreddi Cambia Health Solutions Bhargavi Kanakamedala Cambia Health Solutions Tim Rilling Cambia Health Solutions Gordon Krenn Cambia Health Solutions Tim Peterson Cambia Health Solutions

27 Upcoming Work Detailed XML File Tests Test Results Load Directly to Rally. Golden-file Comparison with Definable Filtering Golden File Comparison for PostScript

28 References Python http://www.python.org/ http://en.wikipedia.org/wiki/python_(programming_language) Py.Test http://www.pytest.org/ Oracle Python Library http://cx-oracle.sourceforge.net/html/ Python ODBC Library https://code.google.com/p/pyodbc/ Companion paper http://tinyurl.com/kofo3rv/