Automated ETL Testing with Py.Test Kevin A. Smith Senior QA Automation Engineer Cambia Health Solutions
2 Agenda Overview Testing Data Quality Design for Automation & Testability Python and Py.Test Examples
3 Database Applications Taking data out of an OnLine Transaction Processing (OLTP) system and putting into an OnLine Analytical Processing (OLAP) system involves Extracting the data, Transforming the data and then Loading that data into another database (ETL) When Testing an ETL Application: Extract Transform Compare
Cambia EVR Application 4
5 Testing Data Quality Data Completeness Ensures that all expected data is loaded. Data Integrity Ensures that the ETL application rejects, substitutes default values or corrects and reports invalid data. Data Transformation Ensures that all data is transformed correctly according to business rules and/or design specifications.
6 Testing Techniques Stare & Compare Validate data transformations manually. This step is usually required to bootstrap an ETL test automation project
7 Testing Techniques Golden Files Use well-known test data and golden file comparison as a testing oracle. This technique is very powerful for automated testing of printed output.
8 Testing Techniques Self-Verifying Oracle in Test Scripts model. Necessary if you want to test any aspects of the ETL application running in production.
9 Design for Automation Control How well the application can be controlled from the test tools. Visibility How well are intermediate data and results visible to the test tools.
Design for Testability - Visibility 10
11 Test Tools - Rules of Thumb 1. Do not re-invent the wheel. 2. No test tool will do everything you need - customize 3. No one test tool will solve all of your test problems tool box. 4. Do not expect your business experts or developers to be able to create great tests, even with tools. 5. Do not use one-off technology for testing. 6. Do not use the built-in test module to your ETL development tool.
12 Tool Requirements Support Customization Support Source to Target Data Mapping Support Complex Logical Calculations Support database connections Support CSV and XML Existing Tool Customizable Leverage Existing Knowledge Multi-OS (AIX, Windows)
13 Python and Py.Test Support Oracle and Sybase databases with 3 rd party libraries: PyODBC, cx_oracle Native support for CSV files and XML Strong support for containers (Tuple, List, Dict) Easy learning curve for non-programmers
14 What is Py.Test? Searches Disk for Tests Sequences and Executes Tests Captures Output Captures Exceptions Reports Results Interfaces to Extend/Customize Behavior Command Line Processing Test Search/Sequencing/Selecting Test Handling (Fixtures) Reporting
15 Database Support conn = cx_oracle.connection(user_name, password, server_name) crsr = conn.cursor() query_string = <<<embedded sql statement>>> crsr.execute(query_string) for row in crsr.fetchall(): key = str(row[0]) + _ + str(row[1]) results[key] = { source : row, target : ( Missing,)}
16 CSV File Support import csv csv_data = csv.reader(open( data.csv, for row in csv_data(): newline= ), delimiter= ) key = str(row[0]) + _ + str(row[1]) results[key] = { source : row, target : ( Missing,)}
17 Row Comparison for value in results.values(): assert value[ source ] == value[ target ]
18 Test Patterns Database Schema Row Counting Simple Source to Target Mapping Complex Source to Target Mapping
19 Database Schema table_names = ('OUTPUT_CD_TRNSLTN', 'OUTPUT_DRAG_DT', 'OUTPUT_NTWK', 'OUTPUT_PH_NUM') def test_dev_schema(): """ Test the development database. """ schemas = [] crsr = Database.get_cursor('DEV') for table in table_names: schemas.append(get_table_dict(crsr, 'dev', table, out_dir, base_dir)) crsr.close() generic_schema_compare(schemas, 'Development')
20 Database Schema (cont d) def generic_schema_compare(results, title): """ Generic table comparison test. """ test_rslt = True for schema in results: if schema[ source']!= schema[ target']: schema[ source'].show_diffs(schema[ target']) test_rslt = False assert test_rslt, title + ' schema differences'
21 Row Counting crsr.execute(""" Select count(*) From FEP_PMT.FEP_CLM Where FDS_BAT_ID = :arg_1 and DISP_CD in ('1','2','9') and AMT_PAID < 0""", arg_1 = fds_bat_id) for row in crsr.fetchall(): pass actual = row[0] assert actual == 0, 'Negative claims found, invalid incoming data'
22 Complex Source to Target for key, val in get_claim_lines.items(): expected_contract_adj_amt = 0 # calculate the expected contractual adjustment amount # walk the fields by field name for i in range(1,6): # calculate the base name of this hag "row" hag_base_name = 'HAG'+ str(i) + '_ADJ_' if [hag_base_name + 'CDE'] == 'CO': if(val[hag_base_name + 'RSN1']!= '23' and val[hag_base_name + 'RSN1']!= '171'): expected_contract_adj_amt += val[hag_base_name + 'AMT1'] if(val[hag_base_name + 'RSN2']!= '23' and val[hag_base_name + 'RSN2']!= '171'): expected_contract_adj_amt += val[hag_base_name + 'AMT2'] # now compare the calculation to the amount retrieved from the table if round(val['cntrctl_adjstmt_amt'], 4)!= round(expected_contract_adj_amt, 4) : print('claim_trans_disp_line: ' + key + ' did not calculate correctly.') print('actual:', round(val['cntrctl_adjstmt_amt'], 4), 'Expected:', round(expected_contract_adj_amt, 4)) print() test_result = False assert test_result, 'Incorrect contractual adjustment calculations'
23 In-memory Data Representation key = str(row[0]) + _ + str(row[1]) results = {} # create a dict to hold in-memory # tables of source and target data results = {key 1 : { source : row, target : row}, key 2 : { source : row, target : row, source 1 : row, source 2 : row}, key 3 : { source : row, target : ( Missing,)}, key 4 : { source : ( Added,), target : row}}
Customized Test Output 24
25 Customizations Shared Database Connection Pool Database connection parameters, including obfuscated login information INI-file Processing File directories for XML, CSV, baseline and output logging files. Default values for command line options, such as logical database name mapping Command Line Option Processing Batch ID Database Names Standard Test Routines Source to Target Mapping Database Schema Testing
26 Team James Bass UTi William Buse Cambia Health Solutions Matthew Pierce Cambia Health Solutions Venkatesh Marada Cambia Health Solutions Kanthi Kondreddi Cambia Health Solutions Bhargavi Kanakamedala Cambia Health Solutions Tim Rilling Cambia Health Solutions Gordon Krenn Cambia Health Solutions Tim Peterson Cambia Health Solutions
27 Upcoming Work Detailed XML File Tests Test Results Load Directly to Rally. Golden-file Comparison with Definable Filtering Golden File Comparison for PostScript
28 References Python http://www.python.org/ http://en.wikipedia.org/wiki/python_(programming_language) Py.Test http://www.pytest.org/ Oracle Python Library http://cx-oracle.sourceforge.net/html/ Python ODBC Library https://code.google.com/p/pyodbc/ Companion paper http://tinyurl.com/kofo3rv/