Improving Maintenance and Performance of SQL queries



Similar documents
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Effective Use of SQL in SAS Programming

Oracle 10g PL/SQL Training

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

SUGI 29 Hands-on Workshops

SAS Programming Tips, Tricks, and Techniques

Paper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

Welcome to the topic on queries in SAP Business One.

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Switching from PC SAS to SAS Enterprise Guide Zhengxin (Cindy) Yang, inventiv Health Clinical, Princeton, NJ

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

Managing Data Issues Identified During Programming

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer, continued

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

9.1 SAS. SQL Query Window. User s Guide

SQL Pass-Through and the ODBC Interface

SAS PASSTHRU to Microsoft SQL Server using ODBC Nina L. Werner, Madison, WI

Registry Tuner. Software Manual

USING SAS WITH ORACLE PRODUCTS FOR DATABASE MANAGEMENT AND REPORTING

PERFORMANCE TIPS FOR BATCH JOBS

PharmaSUG Paper QT26

Oracle SQL. Course Summary. Duration. Objectives

Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA

Data Integrator Performance Optimization Guide

Performance Tuning for the Teradata Database

Defining a Validation Process for End-user (Data Manager / Statisticians) SAS Programs

# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Live Event Count Issue

Cobol. By: Steven Conner. COBOL, COmmon Business Oriented Language, one of the. oldest programming languages, was designed in the last six

Business Warehouse BEX Query Guidelines

SUGI 29 Data Warehousing, Management and Quality

Introduction. Why Use ODBC? Setting Up an ODBC Data Source. Stat/Math - Getting Started Using ODBC with SAS and SPSS

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Writing SQL. PegaRULES Process Commander

Module 9. User Interface Design. Version 2 CSE IIT, Kharagpur

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

UNIX Operating Environment

Oracle Database 10g: Introduction to SQL

MS SQL Performance (Tuning) Best Practices:

Intelligent Query and Reporting against DB2. Jens Dahl Mikkelsen SAS Institute A/S

REP200 Using Query Manager to Create Ad Hoc Queries

System Requirements. SAS Profitability Management Deployment

Welcome to the topic on approval procedures in SAP Business One.

Creating Raw Data Files Using SAS. Transcript

SAS Enterprise Guide A Quick Overview of Developing, Creating, and Successfully Delivering a Simple Project

Utilities ComCash

Database Programming with PL/SQL: Learning Objectives

Parallel Data Preparation with the DS2 Programming Language

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

Key Tasks for Accelerated Reader Renaissance Place

SAS Views The Best of Both Worlds

PharmaSUG Paper MS05

From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL

SQL Query Evaluation. Winter Lecture 23

Contents. 2. cttctx Performance Test Utility Server Side Plug-In Index All Rights Reserved.

Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD

Toad for Data Analysts, Tips n Tricks

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

About PivotTable reports

Talking to Databases: SQL for Designers

InfiniteInsight 6.5 sp4

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Understanding BEx Query Designer: Part-2 Structures, Selections and Formulas

Tagging an Existing PDF in Adobe Acrobat 8

Improving Your Relationship with SAS Enterprise Guide

Microsoft Access Lesson 5: Structured Query Language (SQL)

White Paper. Blindfolded SQL Injection

Choosing Encryption for Microsoft SQL Server

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Reporting trends and pain points of current and new customers IBM Corporation

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Introduction to database design

Top Ten SAS DBMS Performance Boosters for 2009 Howard Plemmons, SAS Institute Inc., Cary, NC

vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide

BI 4.1 Quick Start Guide

Top 10 Oracle SQL Developer Tips and Tricks

A basic create statement for a simple student table would look like the following.

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

Sage 500 ERP (7.4) Business Intelligence

Approvals Management Engine R12 (AME) Demystified

Transcription:

PaperCC06 Improving Maintenance and Performance of SQL queries Bas van Bakel, OCS Consulting, Rosmalen, The Netherlands Rick Pagie, OCS Consulting, Rosmalen, The Netherlands ABSTRACT Almost all programmers and a lot of end users are familiar with the term SQL or even with writing SQL queries. The challenge comes with the querying and joining of numerous large tables; together with the performance, which drops rapidly, the maintainability decreases and debugging can become very complicated. This paper will introduce ground rules for a method of programming your SQL queries in a way that they are easy to maintain and debug. It will also show that a query developed in the proposed way will improve the performance enormously compared to other queries. INTRODUCTION This paper relates to the PROC SQL procedure to select data from a number of tables, readers are assumed to be familiar with the SQL syntax. In this paper the EXPLAIN command, the EXPLAIN PLAN statement and the IDXWHERE and IDXNAME options for indexes are mentioned. The EXPLAIN command can be used e.g. with IBM Query Management Facility QMF) or with SAS/Access Interface to Teradata to find out how the SQL query uses indexes and tables. A similar statement, the EXPLAIN PLAN statement, can be used in Oracle. The IDXWHERE and IDXNAME options can be used within SAS and the idea behind these options, the EXPLAIN command and the EXPLAIN PLAN statement is to be sure the query uses the indexes where and whenever possible. TRADITIONAL SQL PROGRAMMING Almost all programming languages support SQL and therefore nearly all programmers will eventually come in contact with SQL based queries. Although the number of statements one can use in SQL is very limited, it is very easy to produce spaghetti - code and code that performs badly. Sometimes reducing the maintenance costs of the developed SQL code can be achieved, simply by making your programming code clearer to other programmers. This can be achieved by introducing more comments in your code and by avoiding programming like SELECT * FROM table a; since in this way it is unclear which variables of the applicable table will be used in your query. Often several tables are put in the FROM statement at the same time which can result in badly performing queries. Moreover, in the case of an error, it can be very difficult to find out what the reason for the error is: SELECT a.var1, a.var2, b.var3, c.var4 FROM table1 a, table2 b, table3 c WHERE a.var5 = b.var3 AND a.var4 = c.var7 AND c.var2 = N OR b.var8 < a.var6); METHOD OF DEVELOPING SQL QUERIES; ENCAPSULATING S Reducing maintenance costs starts with the development of the code. If the code is developed in a concise way, you assure that maintenance will be made as easy as possible. This chapter will introduce a specific method of creating easy-to-maintain and easy-to-develop SQL code; the Encapsulating Join method. 1

To demonstrate this method, data of a study in which the bodyweight of subjects is measured on several time points will be used. Suppose we would like to create a table in which the bodyweight of all subjects of a specific trial, the start date of treatment of those subjects and the description of the study are all available. The study description of all studies can be found in the dataset PROJECT in which one record per study exists. The dataset DEMO contains for each subject in each study the start date of treatment and the table BODYWEIG contains for each study, for each subject, the bodyweight for several time points. With the help of the following ground rules the requested table will be generated according to the encapsulating join method. This is a simple example in which information of only 3 tables is joined, but this way of developing can easily be extended to more tables and more complex code. GROUND RULES Rule 1: Make sure indexes are available and used if they will make the query run faster. You would want to keep the number of indexes to a minimum to reduce disk space and update costs, but indexes can reduce query time a lot. In general indexes are useful for queries that retrieve a relatively small number of rows less than 15%). Indexing small tables and indexing columns with a small number of distinct values generally does not result in performance gain. Our PROJECT table is quite small, so we will not create an index on this table, but the DEMO table and especially the BODYWEIG table are large and contain over one hundred different studies and therefore an index is created on the column PROJECT as can be seen within SAS by using the PROC CONTENTS procedure or by looking at the indexes tab of the properties of the BODYWEIG dataset: Using the index named PROJECT would be useful as is illustrated by the following two test runs. In these test runs the IDXWHERE and IDXNAME options are used to specify, respectively, whether or not SAS should use the indexes and which index should be used. If these options are not specified SAS will determine for you if the index is used or not. The MSGLEVEL=I option is used to show information in the log about the indexes used. /*Test run 1: Do not use indexes*/ OPTIONS MSGLEVEL = I; SELECT * FROM report.bodyweigidxwhere=no) WHERE project = '123456'; INFO: Data set option IDXWHERE=NO) forced a sequential pass of the data rather than use of an index for where-clause processing. NOTE: Table WORK.TEST created, with 5672 rows and 22 columns. real time 59.88 seconds 0.91 seconds /*Test run 2: Use the index PROJECT*/ OPTIONS MSGLEVEL = I; 2

SELECT * FROM report.bodyweigidxname=project) WHERE project = '123456'; INFO: Index PROJECT selected for WHERE clause optimization. NOTE: Table WORK.TEST created, with 5672 rows and 22 columns. real time 0.08 seconds 0.08 seconds As you can see the processing time will be reduced drastically if the index is used. In QMF you can also use the EXPLAIN command to find out if your query uses the indexes in the right way and you can get more information about how the tables are joined. A similar statement, the EXPLAIN PLAN statement, is available In Oracle and will help you investigate the query execution plan. The EXPLAIN command cannot be used in SAS, but the undocumented options _METHOD and _TREE, which can be put directly after your PROC SQL statement, will give more information about the hierarchy of processing methods that will be chosen. Rule 2: Query only one table at the time. Querying one table at the time will result in more readable code and will make step-by-step generation of your SQL code possible. Rule 3: Start with querying the table that restricts the number of resulting rows returned the most. The BODYWEIG dataset will reduce the number of rows the most and will therefore be the first and only table to be queried in the next step. Once it is determined which columns are needed the first part of the total query can be created: FROM report.bodyweigidxname=project) bw WHERE bw.project = '123456'; Rule 4: Expand your query with information from the other tables by selecting the columns needed from other tables and joining them by using the statement. The start date of treatment, which is available in the DEMO table, can be joined on the variables PROJECT and SUBJNO and therefore the query can be expanded: SELECT de.start_da, /*Start date of treatment*/ ONE.* FROM report.demoidxname=project) de FROM report.bodyweigidxname=project) bw WHERE bw.project = '123456' ON ONE.project = de.project AND ONE.subjno = de.subjno; Please note the first query is completely included in the ) part of the second query. The join encapsulates the entire first query. The FROM statement should always point to a table/view, the should point to the result of the previous query. Never use a SELECT * when selecting from a table or view. You can use the * when referring to selecting all variables from a previous query that you have specified elsewhere in detail. 3

The last information that needs to be added is the study information from the PROJECT table. This will be done in exactly the same way i.e. by putting a statement around this query): SELECT pr.header, /*Study information*/ TWO.* FROM report.project pr SELECT de.start_da, /*Start date of treatment*/ ONE.* FROM report.demoidxname=project) de FROM report.bodyweigidxname=project) bw WHERE bw.project = '123456' ON ONE.project = de.project AND ONE.subjno = de.subjno ) TWO ON TWO.project = pr.project; Rule 5: If you have created a partial query that works as desired, do not change it. This way of composing a query will not only make maintenance a little easier, but also makes development easier. After each step you can view the output of your query and determine whether the result is as expected. Therefore if you have added an extra and the query does not run anymore or gives incorrect results, only the outer query has to be checked for incorrect code. COMPARING THIS METHOD OF DEVELOPING SQL CODE WITH USING A SUB QUERY Sub queries are usually very inefficient ways of programming. An advantage of using a sub query might be that it is more readable, but its performance will often be very bad compared to other methods of creating SQL code. If performance is a key issue, you should probably not use sub queries. The following example illustrates the difference between the encapsulating join method and the use of sub queries. In this example it is chosen to keep only the last available bodyweight measurement for each subject. /*Use of sub queries*/ FROM report.bodyweig bw WHERE bw.samdate = SELECT MAXbw2.samdate) FROM report.bodyweig bw2 WHERE bw.project = bw2.project AND bw.subjno = bw2.subjno AND bw.grouno = bw2.grouno ); NOTE: Table WORK.TEST created, with 83544 rows and 6 columns. real time 7:25.53 7:07.28 4

/*Use of encapsulating join method*/ FROM report.bodyweig bw SELECT bw2.project, bw2.subjno, bw2.grouno, MAXbw2.samdate) AS samdate FROM report.bodyweig bw2 GROUP BY bw2.project, bw2.subjno, bw2.grouno ON bw.project = ONE.project AND bw.subjno = ONE.subjno AND bw.grouno = ONE.grouno AND bw.samdate = ONE.samdate; NOTE: Table WORK.TEST created, with 83544 rows and 6 columns. real time 1:00.04 7.52 seconds The CPU usage clearly demonstrates the benefit of joining compared to using a sub query. CONCLUSION There are many ways to program a specific SQL query and each of them will have its own advantages and disadvantages. Performance of the queries is highly dependant on the way the tables are joined and whether or not indexes are defined. Furthermore the maintenance and development time of the query is highly dependant on the readability of the code and the possibility to debug your code easily. The encapsulating join method, described in this paper, describes a uniform way of programming SQL, which will lead to well performing queries which are easy to debug, because of the step-by-step development of your query. The queries developed with this method might be longer than other queries, but, when accustomed to this method of developing SQL code, they are easy to read and therefore easy to maintain. ACKNOWLEDGMENTS We would like to take the opportunity to thank Yves Poriau - OCS Consulting for contributing to our paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Bas van Bakel Rick Pagie OCS Consulting OCS Consulting PO BOX 490 PO BOX 490 5240 AL ROSMALEN 5240 AL ROSMALEN THE NETHERLANDS THE NETHERLANDS Office: +31 0)73 523 6000 Office: +31 0)73 523 6000 Fax: +31 0)73 523 6600 Fax: +31 0)73 523 6600 sasquestions@ocs-consulting.com sasquestions@ocs-consulting.com www.ocs-consulting.com www.ocs-consulting.com Brand and product names are trademarks of their respective companies. 5