Effective Use of SQL in SAS Programming



Similar documents
Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

Oracle SQL. Course Summary. Duration. Objectives

Oracle Database 12c: Introduction to SQL Ed 1.1

SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC

Oracle Database 10g: Introduction to SQL

Oracle Database: SQL and PL/SQL Fundamentals

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Oracle Database: SQL and PL/SQL Fundamentals

Programming with SQL

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle 10g PL/SQL Training

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Paper An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD

Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

ICAB4136B Use structured query language to create database structures and manipulate data

Managing Tables in Microsoft SQL Server using SAS

SAS PASSTHRU to Microsoft SQL Server using ODBC Nina L. Werner, Madison, WI

CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases

Chapter 1 Overview of the SQL Procedure

1 Structured Query Language: Again. 2 Joining Tables

Using SQL Queries to Insert, Update, Delete, and View Data: Joining Multiple Tables. Lesson C Objectives. Joining Multiple Tables

Demystifying PROC SQL Join Algorithms Kirk Paul Lafler, Software Intelligence Corporation

SQL Pass-Through and the ODBC Interface

Information Systems SQL. Nikolaj Popov

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Using the SQL Procedure

SQL SELECT Query: Intermediate

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

Top Ten Reasons to Use PROC SQL

REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS

MySQL for Beginners Ed 3

Performing Queries Using PROC SQL (1)

Introduction to SQL and SQL in R. LISA Short Courses Xinran Hu

# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Handling Missing Values in the SQL Procedure

USING SAS WITH ORACLE PRODUCTS FOR DATABASE MANAGEMENT AND REPORTING

Advanced Query for Query Developers

Using Proc SQL and ODBC to Manage Data outside of SAS Jeff Magouirk, National Jewish Medical and Research Center, Denver, Colorado

Financial Data Access with SQL, Excel & VBA

Calculating Changes and Differences Using PROC SQL With Clinical Data Examples

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

WORKING WITH SUBQUERY IN THE SQL PROCEDURE

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Course ID#: W 35 Hrs. Course Content

IT2305 Database Systems I (Compulsory)

Instant SQL Programming

Improving Maintenance and Performance of SQL queries

Querying Microsoft SQL Server 20461C; 5 days

Querying Microsoft SQL Server

A basic create statement for a simple student table would look like the following.

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

Chapter 5. SQL: Queries, Constraints, Triggers

Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data

Displaying Data from Multiple Tables

Introduction to Querying & Reporting with SQL Server

Querying Microsoft SQL Server (20461) H8N61S

Querying Microsoft SQL Server 2012

Oracle Database: SQL and PL/SQL Fundamentals NEW

Using SAS Enterprise Business Intelligence to Automate a Manual Process: A Case Study Erik S. Larsen, Independent Consultant, Charleston, SC

WRITING EFFICIENT SQL. By Selene Bainum

Alternatives to Merging SAS Data Sets But Be Careful

Course 10774A: Querying Microsoft SQL Server 2012

Define terms Write single and multiple table SQL queries Write noncorrelated and correlated subqueries Define and use three types of joins

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

and what does it have to do with accounting software? connecting people and business

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Five Little Known, But Highly Valuable, PROC SQL Programming Techniques. a presentation by Kirk Paul Lafler

Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1

Choosing the Best Method to Create an Excel Report Romain Miralles, Clinovo, Sunnyvale, CA

Oracle Database: Introduction to SQL

Introduction to Microsoft Jet SQL

Toad for Data Analysts, Tips n Tricks

Web Development using PHP (WD_PHP) Duration 1.5 months

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute


Oracle Database: Introduction to SQL

Relational Databases

Oracle Database: Introduction to SQL

Oracle Database 11g: SQL Tuning Workshop

Advance DBMS. Structured Query Language (SQL)

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

A Brief Introduction to MySQL

Structured Query Language (SQL)

IT2304: Database Systems 1 (DBS 1)

Welcome to the topic on queries in SAP Business One.

Developing an On-Demand Web Report Platform Using Stored Processes and SAS Web Application Server

SAS Programming Tips, Tricks, and Techniques

SAS Views The Best of Both Worlds

Join Example. Join Example Cart Prod Comprehensive Consulting Solutions, Inc.All rights reserved.

Transcription:

INTRODUCTION Effective Use of SQL in SAS Programming Yi Zhao Merck & Co. Inc., Upper Gwynedd, Pennsylvania Structured Query Language (SQL) is a data manipulation tool of which many SAS programmers are unaware, or not comfortable. Using fewer lines of code as well as achieving improved performance, SQL can accomplish the same goal as many SAS data steps. This paper gives a brief introduction on the subject of relational databases and SQL syntax followed by a variety of tips on how to use SQL effectively in SAS programming. RELATIONAL DATABASE SQL is primarily designed as a programming language to work with relational databases. Many of the features of SQL are directly related to database activities such as retrieving data, updating or deleting data, and so on. In relational databases, relations or tables are associated to each other by primary keys and foreign keys. Primary keys are used to identify each row in a table uniquely and foreign keys are used to maintain the integrity of the database. Many of the SQL primary keys and foreign keys are similar to variables used in the SAS by-merge data step. A database schema is used to describe the structure and relationship among tables. Using SQL gives the ability to check the schema to find common variables between tables and variable attributes such as data type, format, etc. To make data retrieval or updating more efficient, SQL can create and use a database index. This is similar to SAS Proc SQL where we could create and store an index within a dataset when working with large datasets. Using SQL, views or virtual tables can be created to manipulate data in exactly the same way as they are created in SAS Proc SQL. In summary, knowing the basics of SQL in relational database can help SAS programmers develop better SAS code. SQL BASICS SQL Structured Query Language - developed by IBM in the early 1970s, is a standard interactive and programming language for querying, modifying data, and managing databases. The basic syntax is shown in the following example: Select d.subjid, d.treat_cd, a.exam_val From demos d, assy a Where d.subjid = a.subjid Group by d.treat_cd Order by d.subjid; 1

Although SQL is both an ANSI and an ISO standard, many database products support SQL with proprietary extensions to the standard language such as Oracle SQL, SQL Server, MySQL, and so on. Proc SQL is the SAS version of SQL. Proc SQL adopts most of the standard SQL features with additional SAS ingredients such as dataset options, SAS functions, etc. As a result, SAS SQL has the power of regular SQL and many SAS special add-on features. TERMINOLOGY To help less-experienced SAS programmers better understand the different terms used by database SQL programmers and SAS programmers, a comparison of these terms is displayed in Table 1 below: Table 1: Comparison of SQL Terminology SAS Term Database Term SQL Term Dataset Relation Table Observation Tuple Row Variable Attribute Column Merge Join Join Missing value NULL NULL USE OF SQL IN SAS SAS uses SQL in two different ways Where statement and Proc SQL. Where statement is one of the most commonly used SAS statements. The concept and syntax, however, were originally adopted from SQL - this is one example that SAS is a powerful language that imports and mixes syntax from other languages. Proc SQL is the main tool within SAS to use SQL. While Proc SQL is a SAS procedure, it performs many functions similar to those found within SAS data steps. Often, for data manipulation, data step or Proc SQL can be used either individually or interchangeably. Four major areas which describe the effective use of SQL in SAS Proc SQL are outlined in the following sections. I. Access Relational Database In SAS, there are two approaches to access relational databases. One is the LIBNAME Statement and the other is the pass-through facility. Below is an example of the passthrough facility. The code is to read a demographic table from an Oracle database and output all those allocated subjects. 2

connect to odbc (dsn=&dsn uid=&uid pwd=&pwd); create table demo as select * from connection to odbc (select distinct allocation_number subjid, visit_number vt_num, age from std_demos where allocation_number is not null ); disconnect from odbc; Programming Tips: Get login credentials from interactive Window input for security reasons. Do not use multiple joins to retrieve data - it is more efficient if multiple CREATE TABLE statements are used. If possible, avoid the use of ORDER BY to speed up execution. Use index if available. II. Create Macro Variables Using the Into Clause SAS programmers often use %LET or SAS function CALL SYMPUT() to create macro variables. The following is an example: Data _null_; set dup nobs=obs; call symput( totdup', compress(put(obs, best.))); There is an alternative approach to achieving the same result by using the following SQL procedure: Proc sql noprint; select count(*) into : totdup from dup; The Into clause stores the value of one or more columns in macro variable(s) for use later in another Proc SQL query or SAS statement - below is an example: Proc sql noprint; select count (distinct treat_cd) into : tot_trt from sero_all; select distinct treat_cd into :_trt1 - :_trt&tot_trt from sero_all; 3

The above code creates a macro variable &TOT_TRT to store the total number of treatment groups, creates macro variables &TRT1, &TRT2, and stores the names of treatment groups in them. The total number of macro variables is determined by the value in &TOT_TRT. Following is another example using an automatic macro variable &SQLOBS: Proc sql noprint; create table count_by as select distinct (&byvar) from datadir.&inds; select &byvar into :byv1 - :byv&sqlobs from count_by; Programming Tips: &Sqlobs is an automatic macro variable created by SAS to store the number of observations in a dataset. It is similar to _null_ in data step. Use option Noprint to prevent printing to the SAS list. No need to repeat Proc SQL for each SQL statement. Separate variables with a comma, not a space. Use Distinct to select unique observations. Use Quit, not Run, at the end. III. Merge (Join) Tables The biggest advantage of a SQL join is that there is no need for sorting and renaming which is especially useful when dealing with large datasets. The following is corresponding code for a by-merge data step and SQL join: Proc sort data = one; Proc sort data = two (rename = (an_num = subjid)); Data three; Merge one two; Run; Merge (Join) Create table three as Select * From one, two Where one.subjid = two.an_num; Quit; There are two kinds of joins in SQL: inner join and outer join. An inner join returns a result table for all the rows in a table that have one or more matching rows in the other 4

table(s). The example above is an implied inner join and can be re-written with specific inner join key words as shown below: Proc sort data = one; Proc sort data = two (rename = (an_num = subjid)); Data three; Merge one(in=a) two(in=b); If a and b; Run; Inner Join Create table three as Select * From one INNER JOIN two ON one.subjid = two.an_num; Quit; Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the joins. The three types of outer joins are left, right, and full join. Below are examples of outer joins: Proc sort data = one; Proc sort data = two (rename = (an_num = subjid)); Data three; Merge one(in=a) two(in=b); If a; Run; Left Join Create table three as Select * From one LEFT JOIN two ON one.subjid = two.an_num; Quit; 5

Proc sort data = one; Proc sort data = two (rename = (an_num = subjid)); Data three; Merge one(in=a) two(in=b); If b; Run; Right Join Create table three as Select * From one RIGHT JOIN two ON one.subjid = two.an_num Quit; A full outer join, specified with the keywords FULL JOIN and ON, returns all the rows from all the tables regardless of whether they match. The full outer join is rarely used in the real world. IV. Transform Data SQL is used frequently for creating, renaming new variables, and ordering output. Suppose we have the following task at hand: Create a new variable new_v1 by concatenating v1 and v2 Create a new variable new_v2 as the sum of v3 Rename v4 and v5 as out4 and out5 Only output new_v1, new_v2, out4, out5, v3 and in that particular order in the output dataset Here is the code: create table new as select v1 v2 as new_v1, sum(v3) as new_v2, v4 as out4, v5 as out5, v3 from old; Programming Tips: SAS dataset options such as keep, drop, rename and SAS functions can be used within Proc SQL. Here is an example: %let label=this is the label; create table one (label="&label" drop=subject_no center) as select * from tx t1, 6

scores t2 where t1.subject_no=input(substr(t2.subject_id,5),8.) and t1.center=input(substr(t2.subject_id,1,3),8.); Use of a sub query or in-line view. A query-expression is called subquery when used in WHERE or HAVING clauses. It is nested as part of another query-expression. An in-line view is a special subquery used in the FROM clause. This can be used in situations such as identifying those patients who are older than the average age of all patients and who experienced an Adverse Event. Select subjid, birth_dt, age, gender from std_demos where age > (select avg(age) from std_demos) and subjid in (select distinct subjid from std_ae) order by subjid; Use of set operators like UNION, INTERSECT, EXCEPT Should avoid Cartesian product which is similar to SAS merge without by variable(s) CONCLUSION Proc SQL is more powerful and efficient than SAS data steps in certain cases, with fewer lines of code. SQL is a basic tool for many job functions that involve working with databases. Mastering SQL could result in project (or job) opportunities and enhance career growth. Proc SQL must be used wisely or it can become complicated and inefficient.. In summary, Proc SQL is an excellent alternative to non-sql Base SAS, making it worth the programmers' time to explore its use. REFERENCES Feng, Ying Tips for Using SQL: When to Use and How?" Proceedings of the 18th Annual NorthEast SAS Users Group Conference, POS12, 2005. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA 7

registration. Other brand and product names are trademarks of their respective companies. AUTHOR CONTACT INFORMATION Yi Zhao Senior Scientific Programming Analyst Merck Research Laboratories UG1CD-38 PO Box 1000 North Wales, PA 19454 Phone: 267-305-7672 Email: yi_zhao@merck.com 8