WORKING WITH SUBQUERY IN THE SQL PROCEDURE



Similar documents
Handling Missing Values in the SQL Procedure

Effective Use of SQL in SAS Programming

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Instant SQL Programming

Querying Microsoft SQL Server

Oracle Database 12c: Introduction to SQL Ed 1.1

Querying Microsoft SQL Server (20461) H8N61S

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

Course ID#: W 35 Hrs. Course Content

Querying Microsoft SQL Server 20461C; 5 days

MOC QUERYING MICROSOFT SQL SERVER

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Querying Microsoft SQL Server 2012

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

MOC 20461C: Querying Microsoft SQL Server. Course Overview

1 Structured Query Language: Again. 2 Joining Tables

Querying Microsoft SQL Server 2012

Define terms Write single and multiple table SQL queries Write noncorrelated and correlated subqueries Define and use three types of joins

Course 10774A: Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

Oracle SQL. Course Summary. Duration. Objectives

Introduction to Querying & Reporting with SQL Server

Calculating Changes and Differences Using PROC SQL With Clinical Data Examples

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

Oracle 10g PL/SQL Training

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

Join Example. Join Example Cart Prod Comprehensive Consulting Solutions, Inc.All rights reserved.

Advanced Query for Query Developers

SQL Performance Hero and OMG Method vs the Anti-patterns. Jeff Jacobs Jeffrey Jacobs & Associates jmjacobs@jeffreyjacobs.com

MySQL for Beginners Ed 3

Consulting. Personal Attention, Expert Assistance

SQL: Queries, Programming, Triggers

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Paper An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

SQL: Queries, Programming, Triggers

Oracle Database: SQL and PL/SQL Fundamentals

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

Oracle Database: SQL and PL/SQL Fundamentals

Chapter 5. SQL: Queries, Constraints, Triggers

SQL Query Performance Tuning: Tips and Best Practices

ICAB4136B Use structured query language to create database structures and manipulate data

Querying Microsoft SQL Server Querying Microsoft SQL Server D. Course 10774A: Course Det ails. Co urse Outline

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

Advanced SQL - Subqueries and Complex Joins

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

SQL Query Evaluation. Winter Lecture 23

Boats bid bname color 101 Interlake blue 102 Interlake red 103 Clipper green 104 Marine red. Figure 1: Instances of Sailors, Boats and Reserves

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Lecture 4: More SQL and Relational Algebra

Oracle Database 11g SQL

Oracle Database: SQL and PL/SQL Fundamentals NEW

SQL Server. 1. What is RDBMS?

Relational Division and SQL

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

More on SQL. Juliana Freire. Some slides adapted from J. Ullman, L. Delcambre, R. Ramakrishnan, G. Lindstrom and Silberschatz, Korth and Sudarshan

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

PharmaSUG Paper TT03

Saskatoon Business College Corporate Training Centre

Inside the PostgreSQL Query Optimizer

Netezza SQL Class Outline

Improving Maintenance and Performance of SQL queries

Database Programming with PL/SQL: Learning Objectives

Introduction to Microsoft Jet SQL

Inquiry Formulas. student guide

ADVANCED 1 SQL PROGRAMMING TECHNIQUES

Databases 2011 The Relational Model and SQL

The Query Builder: The Swiss Army Knife of SAS Enterprise Guide

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

SQL Server Database Coding Standards and Guidelines

Introduction to database design

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Relational Databases

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC

Data Tool Platform SQL Development Tools

Oracle Database: Introduction to SQL

Oracle Database: Introduction to SQL

Relational Algebra and SQL

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

Oracle Database: Introduction to SQL

Information Systems SQL. Nikolaj Popov

IT2304: Database Systems 1 (DBS 1)

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Relational Database: Additional Operations on Relations; SQL

IT2305 Database Systems I (Compulsory)

Microsoft Access 3: Understanding and Creating Queries

Querying Microsoft SQL Server 2012

Introduction to SQL C H A P T E R3. Exercises

Oracle Database 10g: Introduction to SQL

Chapter 13: Query Optimization

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

9.1 SAS. SQL Query Window. User s Guide

Using SQL Queries to Insert, Update, Delete, and View Data: Joining Multiple Tables. Lesson C Objectives. Joining Multiple Tables

DBMS / Business Intelligence, SQL Server

An automatic predictive datamining tool. Data Preparation Propensity to Buy v1.05

Transcription:

WORKING WITH SUBQUERY IN THE SQL PROCEDURE Lei Zhang, Domain Solutions Corp., Cambridge, MA Danbo Yi, Abt Associates, Cambridge, MA ABSTRACT Proc SQL is a major contribution to the SAS /BASE system. One of powerful features in SQL procedure is, which provides great flexibility in manipulating querying data in multiple tables simultaneously. However, is the subtlest part of the SQL procedure. Users have to underst the correct way to use subqueries in a variety of situations. This paper will explore how subqueries work properly with different components of SQL such as WHERE, HAVING, FROM, SELECT, SET clause. It also demonstrates when how to convert subqueries into JOINs solutions to improve the query performance. INTRODUCTION One of powerful features in the SAS SQL procedure is its, which allow a SELECT statement to be nested inside the clauses of another SELECT, or inside an INSERT, UPDATE or DELETE statement, even or inside another. Depending on the clause that contains it, a can select a single value or multiple values from related tables. If more than one is used in a queryexpression, the innermost query is evaluated first, then the next innermost query, so on. PROC SQL allows a (contained in parentheses) to be used at any point in a query expression, but user must underst when where to use uncorrelated/correlated subqueries, non-scalar scalar subqueries, or the combinations. This paper will discuss the usage with following sections: Subquery in Where clause Subquery in Having clause Subquery in From clause Subquery in Select clause Subquery in SET clause Subquery JOINs The examples in this paper refer to an imaginary set of clinical trial data, but the techniques used will apply to other industries as well. There are three SAS data sets in the example database: Data set DEMO contains variables (Patient ID), AGE, SEX. Data set VITAL contains variables, VISID (Visit ID), VISDATE (Visit date), DIABP SYSBP (patient s diastolic/ systolic blood pressure which is measured twice at a visit). Data set VITMEAN contains patient s mean diastolic/ systolic blood pressure at each visit recorded by doctors in variables, VISID, VISDATE, MDIABP MSYSBP. The source SAS codes to create the three data sets are listed in the appendix. SUBQUERY IN THE WHERE CLAUSE We begin with WHERE clause because it is the most common place to use subqueries. Consider following two queries: Example 1: Find patients who have records in table DEMO but not in table VITAL. Select patid from demog Where patid not in (select patid from vital); ----- 150 Example 2: Find patients who didn t have their blood pressures measured twice at a visit. select patid from vital T1 where 2>(select count(*) from vital T2 where T1.patid=T2.patid T1.visid=T2.visid); ----- 110 140 In the example 1, The SELECT expression in the parentheses is an un-correlated. It is an inner query that is evaluated before the outer (main) query its results are not dependent on the

values in the main query. It is not a scalar because it will return multiple values. In the example2, the SELECT expression in the parenthesis is a correlated because the refers to values from variables T1. T1.VISID in a table T1 of the outer query. The correlated is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the the outer query together. The is also a scalar because aggregate function COUNT(*) always returns one value with one column for each row values from the outer query at a time. Multiple subqueries can be used in the SQL WHERE statement. Example 3: Find patients who have the maximum number of visits. Select distinct patid from vitmean as a Where not exists (select visid from vitmean Except Select visid from vitmean as b Where a.patid=b.patid); ----- 120 Below is a nested correlated solution to example 3. Select distinct patid from vitmean as a where not exists (select distinct visid from vitmean as b where not exists (select visid from vitmean as c where a.patid=c.patid b.visid=c.visid)); This solution is using a double negative structure. The original question can be put in another way: Find each patient for whom no visit exists in which the patient concerned has never had. The two nested ies produce a list of visits for whom a given patient didn t have. The main query presents those patients for whom the result table of the is empty. SQL procedure determines for each patient separately whether the yields no result. In the WHERE clause, IN, ANY, ALL, EXISTS predicates are allowed to use any type of subqueries (see example 1 3 above); However, only scalar subqueries can be used in relation operators (>, =, <, etc), BETWEEN, LIKE predicates. One way to turn a to into a scalar is using aggregate functions. Consider Example 4 below, which two scalar subqueries are used in the BETWEEN predicate. Example 4: Find patients whose age is in the average age +/- 5 years. Select patid from demog where age between (select Avg(age) from demog)-5 (select avg(age) from demog)+5; ------ 110 SUBQUERY IN THE HAVING CLAUSE Users can use one or more subqueries in the Having clause as long as they do not produce multiple values. It means that the most outer in the Having clause must be scalar, either uncorrelated or correlated. The following query is an example, Example 4: Find patients who didn t have maximun number of visits. Select patid From vitmean as a Group by patid

Having count(visid) <(select max(cnt) from (select count(visid) as cnt from vitmean group by patid) ); ------ 100 110 130 140 The scalar in this comm counts the maximum number of visits first, then main query find the patient whose number of visits is not equal to the maximun value calculated by the. For a correlated scalar in the Having clause, see example 6 in next section. SUBQUERY IN THE FROM CLAUSE Subqueries can be used in the SQL FROM clause to constitute in-line views in the SQL procedure because subqueries can be regarded as temporary tables. You will notice the usage in the example 4, in which in the From clause is actually an inline view. In-line views are views in the FROM clause. They are extended temporary tables which are constructed by UNION, INTERSECT, EXCEPT, JOINs based on tables subqueries. In-line view can not correlate with its main query, but outer (main) query can refer any values in the tables subqueries which constitute the in-line view. The main advantages of using in-line view is improving the SQL performance reducing the SQL statements. Consider Example 5. Example 5: Calculate the interval between this visit previous visit for each patient visit. proc sql; select a.patid, a.visid, intck('day', b.visdate, a.visdate) as interval from (select distinct patid, visid, visdate from vital where visid>'1') as a, (select distinct patid, visid, visdate from vital where visid<'4') as b where(input(a.visid,2.)-1 =input(b.visid,2.)) a.patid=b.patid order by a.patid, a.visid; VISID INTERVAL ---------------------------- 100 2 7 100 3 7 120 2 8 120 3 6 120 4 34 130 2 35 130 3 38 140 2 9 In the example 5, two subqueries are joined together to form an in-line view in the From clause. The first one has all required patient visit data with visit ID >1, the second one has all required patient visit data with visit ID <4. They are joined by VISID to calculate the interval between this visit previous visit. The result is shown above. Example 6 below is more complicated case, in which two subqueries are joined together by visid patid to from an in-line view. The results from the in-line view are then used by main (outer) query to obtain the list of patients who have exact same visits. Example 6: Find patients who have exact same visits. proc sql; create table vital as select distinct patid, visid from vital; select T1. as, T2. as 1 FROM (select patid, visid from VITAL) AS T1 inner join

(select patid, visid from VITAL) AS T2 on T1.visid=T2.visid T1.patid < T2.patid Group BY T1., T2. Having count(*)=(select count(*) From vital as T3 Where T3.=T1.) count(*)= (Select Count(*) From Vital as T4 RESULT: Where T4.=T2.); 1 ----------------- 100 130 SUBQUERY IN THE SELECT CLAUSE SAS SQL can use a scalar or even a correlated scalar in the SELECT clause. If the results are an empty, the result is a missing value. This feature lets you write a LEFT JOIN as a query within SELECT clause. Here is an example. Example 7: Compute the difference between recorded mean of SYSBP from VITMEAN table calculated mean of SYSBP from VITAL for each patient visit. Select a.patid, a.visid, a.msysbp - (select avg(sysbp) from vital as b where a.patid=b.patid a.visid=b.visid) as diff from vitmean as a; VISID DIFF 100 1 0 100 2 0 100 3 0 110 1 0 110 3 0 120 1 0 120 2 34 120 3 7.5 120 4 0 130 1-0.5 130 2-0.5 130 3-0.5 140 1 0 140 2 0 The in the SQL clause is a correlated scalar. It will calculate average systolic blood pressure based on VITAL table for each patient visit from table VITMEAN. The main query then use the result to calculate the difference between recorded systolic blood pressure calculated systolic blood pressure for each patient visit. The process will be repeated until all patients in table VITMEAN are checked. SUBQUERY IN THE SET CLAUSE You can use scalar /or correlated scalar in the SET clause of the Update statement to modify a table based on itself or another table. Here is an example Example 8: Modify the mdiabp msysbp in the VITMEAN using values calculated from table Vital; proc sql; UPDATE VITMEAN as a Set mdiabp=(select avg(b.diabp) from VITAL as b where a.patid=b.patid a.visid=b.visid), msysbp=(select avg(c.sysbp) from VITAL as c wherea.patid=c.patid a.visid=c.visid);

SUBQUERY AND JOINS Some users find difficult to read, consequently fear that subqueries are difficult to use. Indeed, many subqueries can be replaced by one or more join queries. However, if you can t construct an equivalent join query, the only alternative will be to use a series of queries pass the results from one query to another. This is obviously less efficient than using subqueries. Subqueries can perform wonders of selection comparison so that one set of statements nested subqueries can replace several non statements. This imporove efficiency could even effect performance. Generally whether using a join or a is simply a matter of individual preference: they can often be used interchangeably to solve a given problem. Users will find that subqueries exceed the capabilities of join for certain queries. In some cases, a is the shortest route to the results a user need. However, if users can find an equivalent join query for a, it can replace the to improve the performance, For example, the JOIN solution to the example 1 is: proc SQL; select a.patid from demog as a left join (select distinct patid from vital) as b on a.patid=b.patid where b.patid is NULL; the JOIN solution to Example 7 is. select a.patid, b.visid, max(msysbp)-avg(b.sysbp) as diff from vitmean as a left join vital as b on a.patid=b.patid a.visid=b.visid group by a.patid, b.visid; In some situation, it may be worthwhile to look into rewriting such subqueries as JOINs, but the development maintenance of such codes may have them less desirable than subqueries because JOINs solution may improve the performance at the cost of losing readability. CONCLUSIONS We summarize the usages of in the different SQL components as follows: SQL Components Relation Operator (<, =, >, etc) Between, Like IN, ANY, ALL, SOME, Exists FROM Clause Having Clause Select Clause SET clause Usage Uncorrelated/Correlated Uncorrelated SAS SAS/BASE mentioned in this paper are registered trademarks or trademarks of SAS Institute Inc. in the USA other countries. indicates USA registration. REFERENCES SAS Institute, Inc. (1992), SAS Guide to the SQL Procedure: Usage Reference, Version 6, First Edition SAS Institute, Inc. (1994), SAS Technical Report P- 222, Changes Enhancements to Base SAS Software, Release 6.07. Getting started with the SQL, Procedure, First Edition AUTHOR S ADDRESS Lei Zhang DomianPharma Inc. 10 Maguire Road Lexington, MA 02140, USA Tel: 781-778-3880 Fax: 781-778-3700 e-mail: lzhang@domainpharma.com Danbo Yi Abt Associates Inc. 55 Wheeler Street Cambridge, MA 02138-1168, USA Tel: 617-349-2346 Fax: 617-520-2940 e-mail: danbo_yi@abtassoc.com

APPENDIX: EXAMPLE SAS DATA SETS: Data demog; input patid $1-3 age 5-6 sex $ 8; datalines; 100 34 M 110 45 F 120 67 M 130 55 M 140 40 F 150 38 M ; run; data vital; input patid $ visid $ visdate mmddyy10. diabp sysbp; format visdate mmddyy10.; datalines; 100 1 01/12/97 66 110 100 1 01/12/97 68 110 100 2 01/19/97 66. 100 2 01/19/97 66 110 100 3 01/26/97 68 120 100 3 01/26/97 68 120 110 1 02/03/97 68 110 110 3 02/12/97 64 115 110 3 02/12/97 68 115 120 1 04/05/97 66 105 120 1 04/05/97 64 105 120 2 04/13/97 110 64 120 2 04/13/97 105 82 120 3 04/19/97 63 105 120 3 04/19/97 105 90 120 4 05/23/97 90 106 120 4 05/23/97 92 108 130 1 02/12/97 66 111 130 1 02/12/97 67 110 130 2 03/19/97 66 109 130 2 03/19/97 66 110 130 3 04/26/97 68 121 130 3 04/26/97 68 118 140 1 02/03/97 68 110 140 2 02/12/97 64 115 140 2 02/12/97 68 115 ; run; data vitmean; input patid $ visid $ visdate mmddyy8. mdiabp msysbp; format visdate mmddyy10.; cards; 100 1 01/12/97 67 110 100 2 01/19/97 66 110 100 3 01/26/97 68 120 110 1 02/03/97 68 110 110 3 02/17/97 66 115 120 1 04/05/97 65 105 120 2 04/12/97 64 107 120 3 04/19/97 65 105 120 4 05/23/97 91 107 130 1 02/12/97 66 110 130 2 03/19/97 66 109 130 3 04/26/97 68 119 140 1 02/03/97 68 110 140 2 02/12/97 66 115 ; run;