Proc SQL A Powerful Tool in SAS Swetha Kongara, PVR Technologies Inc Raja Panchumarthi, Smith Hanley Consulting Group Inc

Similar documents
Paper An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois

Paper Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation

The SAS Data step/macro Interface

Using SQL Queries in Crystal Reports

B) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Remove Voided Claims for Insurance Data Qiling Shi

Effective Use of SQL in SAS Programming

Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

A Method for Cleaning Clinical Trial Analysis Data Sets

Chapter 2 The Data Table. Chapter Table of Contents

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI

Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

The SET Statement and Beyond: Uses and Abuses of the SET Statement. S. David Riba, JADE Tech, Inc., Clearwater, FL

The Query Builder: The Swiss Army Knife of SAS Enterprise Guide

Performing Queries Using PROC SQL (1)

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC

Katie Minten Ronk, Steve First, David Beam Systems Seminar Consultants, Inc., Madison, WI

Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC

Nine Steps to Get Started using SAS Macros

Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY

Alternatives to Merging SAS Data Sets But Be Careful

EXST SAS Lab Lab #4: Data input and dataset modifications

Preparing your data for analysis using SAS. Landon Sego 24 April 2003 Department of Statistics UW-Madison

Introduction to Proc SQL Katie Minten Ronk, Systems Seminar Consultants, Madison, WI

Managing very large EXCEL files using the XLS engine John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT

Calculating Changes and Differences Using PROC SQL With Clinical Data Examples

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

Oracle SQL. Course Summary. Duration. Objectives

Encoding the Password

Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.

The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Instant Interactive SAS Log Window Analyzer

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

A Macro to Create Data Definition Documents

LexisNexis TotalPatent. Training Manual

SAS Comments How Straightforward Are They? Jacksen Lou, Merck & Co.,, Blue Bell, PA 19422

Handling Missing Values in the SQL Procedure

Taming the PROC TRANSPOSE

How To Use Excel With A Calculator

The entire SAS code for the %CHK_MISSING macro is in the Appendix. The full macro specification is listed as follows: %chk_missing(indsn=, outdsn= );

THE POWER OF PROC FORMAT

Oracle Database: SQL and PL/SQL Fundamentals

Tales from the Help Desk 3: More Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

Information and Computer Science Department ICS 324 Database Systems Lab#11 SQL-Basic Query

Oracle Database: Introduction to SQL

Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT

Oracle Database: Introduction to SQL

ABSTRACT INTRODUCTION HOW WE RANDOMIZE PATIENTS IN SOME OF THE CLINICAL TRIALS. allpt.txt

Ad Hoc Advanced Table of Contents

Using SAS With a SQL Server Database. M. Rita Thissen, Yan Chen Tang, Elizabeth Heath RTI International, RTP, NC

Using the Magical Keyword "INTO:" in PROC SQL

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

Subsetting Observations from Large SAS Data Sets

Oracle Database: Introduction to SQL

Table Lookups: From IF-THEN to Key-Indexing

MWSUG Paper S111

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Data Presentation. Paper Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs

How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming

Accounts Receivable Invoice Upload

Paper Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA

Eliminating Tedium by Building Applications that Use SQL Generated SAS Code Segments

Automation of Large SAS Processes with and Text Message Notification Seva Kumar, JPMorgan Chase, Seattle, WA

Import and Output XML Files with SAS Yi Zhao Merck Sharp & Dohme Corp, Upper Gwynedd, Pennsylvania

Tips and Tricks SAGE ACCPAC INTELLIGENCE

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Oracle Database 12c: Introduction to SQL Ed 1.1

Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA

Directions for the Well Allocation Deck Upload spreadsheet

More Tales from the Help Desk: Solutions for Simple SAS Mistakes Bruce Gilsen, Federal Reserve Board

USING SAS WITH ORACLE PRODUCTS FOR DATABASE MANAGEMENT AND REPORTING

An Approach to Creating Archives That Minimizes Storage Requirements

Programming with SQL

A Gentle Introduction to Hash Tables. Kevin Martin, Dept. of Veteran Affairs July 15, 2009

PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays

Post Processing Macro in Clinical Data Reporting Niraj J. Pandya

3.GETTING STARTED WITH ORACLE8i

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

Applications Development ABSTRACT PROGRAM DESIGN INTRODUCTION SAS FEATURES USED

Debugging Complex Macros

Introduction to SAS Mike Zdeb ( , #122

Oracle Database: SQL and PL/SQL Fundamentals

The Advantages of Using RAID

That Mysterious Colon (:) Haiping Luo, Dept. of Veterans Affairs, Washington, DC

Storing and Using a List of Values in a Macro Variable

USING PROCEDURES TO CREATE SAS DATA SETS... ILLUSTRATED WITH AGE ADJUSTING OF DEATH RATES 1

SQL Server Database Coding Standards and Guidelines

How to Reduce the Disk Space Required by a SAS Data Set

How To Write A Clinical Trial In Sas

2 SQL in iseries Navigator

Introduction to Criteria-based Deduplication of Records, continued SESUG 2012

PharmaSUG AD08. Maximize the power of %SCAN using WORDSCAN utility Priya Saradha, Edison, NJ

Transcription:

Proc SQL A Powerful Tool in SAS Swetha Kongara, PVR Technologies Inc Raja Panchumarthi, Smith Hanley Consulting Group Inc ABSTRACT Proc SQL is a powerful tool in the SAS system that can be used in a variety of ways. Its uses include creating SAS datasets or data views, macro variables and data listings. The power of SQL lies in its ability to combine the functionality of the procedures in to one set of programming. You can combine the data from multiple datasets, calculate and integrate multiple summary statistics and sort the resulting data set in one step. This paper will explain you how to create SAS data sets or data views, macro variables and data listings. INTRODUCTION The intent of this document is to describe the syntax of a PROC SQL step and to provide the user with a description of many of the useful statements and capabilities of the procedure. This document is not intended to be a complete reference for PROC SQL. For more detailed information, please refer to the SAS Guide to the SQL Procedure provided by SAS Institute. The document will cover how to create sas data sets, data views, data listings and macro variables. PROC SQL Syntax to Create a Data Set The syntax of the PROC SQL step is one continuous set of statements that end in a single semi-colon. You do not need to end each statement with a semi-colon. Also, there are no requirements for grouping statements on the same line or separate lines of the program, the statements are grouped below in order to logically separate the individual parts of a PROC SQL step. However, the order of these statements is relevant. The order of each statement must follow the order given below. Also, notice that the PROC SQL step ends with a QUIT statement and not with a RUN statement. PROC SQL; CREATE TABLE output-dataset AS SELECT variable <, variable <AS new-variable-name> > FROM input-dataset <AS alias> <, input-dataset <AS alias> > <WHERE expression > <GROUP BY variable <, variable > > <HAVING expression > <ORDER BY variable <, variable >; QUIT; 1

All data set and variable names must follow standard SAS naming conventions. You may use most of the data set options (such as KEEP, DROP, RENAME, and WHERE) with both the input and output data sets in a PROC SQL step. You may use the colon modifier to name variables with the KEEP and DROP data set options. When using these data set options, variable name lists are space delimited as they are in a DATA step. Unlike the DATA step, you do not have the use of the IN= data set operator with PROC SQL. CREATE The CREATE statement is used to name the data set that will contain the results of the SQL statements. The data set name in the CREATE statement can reference a temporary or permanent data set. If this statement is omitted, then PROC SQL creates a data listing like PROC PRINT. CREATE TABLE TEMP CREATE TABLE PERM. STATOUT CREATE TABLE COUNTS (DROP=PERCENT) SELECT The SELECT statement is used to name, rename, and/or create the variables that will make up the resulting data set. You may use SAS functions as well as summary statistical functions available in PROC SQL to create new variables. To specify which variables to keep in the resulting data set, you need to provide a comma separated list on the SELECT statement. To rename a variable or create a new variable using SAS functions or summary statistics, you need to use the AS keyword SELECT * SELECT AE.*, PERPAT.TRT, PERPAT.POP SELECT AE.PATNO, AE_LT, AESOC, TRT, POP FROM: The FROM statement is used to list the data sets that will be used as input to the PROC SQL step. If you are combining data from multiple input data sets, then you need to provide a comma separated list of input data sets. 2

FROM TEMP FROM PERM.PERPAT (WHERE=(POP>=1)) FROM TEMP (DROP=STATUS SORTVAL), PERM.PERPAT (KEEP=SITE PATNO TRT POP) FROM TEMP AS T, PERM.PERPAT AS P WHERE: The WHERE statement is used to specify sub setting criteria or merging criteria for observation selection and processing. If the variable name exists in more than one data set listed on the FROM statement, then you must give the two level variable name using the data set name or alias. You can specify conditions in the WHERE statement that use SAS functions such as SUBSTR, SCAN, or INDEX. WHERE CALCULATED MEANX <= 25 WHERE SUBSTR(ATC_CD,9,3) = '001' WHERE AE.PATNO = PERPAT.PATNO AND AE.AESOC LIKE 'B%' WHERE A.ID = B.ID = C.DIFFID AND (A.VISIT < B.VISIT OR A.VISIT > C.VISIT) GROUP BY: The GROUP BY statement in PROC SQL is used to identify sub-groups to which summary functions will be applied. GROUP BY TRT GROUP BY SITE, PATNO GROUP BY A.LABTEST, B.GENDER, B.AGE HAVING: The HAVING statement specifies a condition that must be met by each sub-group in a GROUP BY statement in order for that sub-group to be kept in the resulting data set. Every HAVING statement must include at least one summary function (otherwise, you could have simply used a WHERE statement), but can also contain conditions that do not involve summary functions. 3

HAVING X > MEAN(X) HAVING (DIFF > 0 AND DIFF = MAX(DIFF)) ORDER BY: The ORDER BY statement specifies a comma separated list of variables or valid SAS formulas to use to sort the resulting table. Like PROC SORT, the default sort sequence is in ascending order; however, you can sort in descending order by using the keyword DESC after the variable name or formula. ORDER BY P.SITE, P.PATNO ORDER BY COUNT/TOTN DESC ORDER BY A.SITE, A.PATNO, AEON_DT, AESEV DESC, AEREL Example for PROC SQL statements to Create a Data Set: The first example illustrates how to use PROC SQL to create a temp data set by combining a permanent data set and a temporary data set and to select a subset of existing variables and calculated variables. This example uses the CASE statement to conditionally create a new variable. Note that the resulting data set will only contain observations that have matching SITE and RANDID values in both data sets (notice that RANDID 3778 is not in the resulting COMM data set). It is helpful to remember that, unlike the DATA step, the input data sets in PROC SQL do not need to be sorted prior to the SQL step. The PROC SQL step will require less processing time if they are sorted in advance, but it is not a requirement. A sample of the CLOS data set. OBS SITE RANDID 1 01 3914 4 07 5334 4

A sample of the PERM.MARGINCO data set OBS SITE SCREENID RANDID TABLE_NA COLUMN_N VISIT COMMENT2 1 01 0009 3778 ALL007 PEBODSYS BM <> 2 01 0009 3778 ALL007 VITAL_DD T28D <> 3 01 0009 3778 ALL007 VMN_NM VP1 <> 4 01 0009 3778 ALL007 VT_NM VP1 <> 5 01 0009 3778 ALL007 VHR_NM VP1 <> 6 01 0009 3778 ALL007 CHGHR_NM VC <> 7 01 0003 3914 ALL007 NEUT_NM BM <> 8 01 0003 3914 ALL007 XRYHR_NM D180 <> 9 07 0003 5334 ALL007 ALTUN BM <> 10 07 0003 5334 ALL007 GROUP_NM A01 IMPROVED 11 07 0003 5334 ALL007 ALTUN T4D <> PROC SQL; CREATE TABLE COMM AS SELECT M.SITE, M.SCREENID, M.RANDID, SUBSTR(M.TABLE_NA,8) AS TABLE_NA, M.COLUMN_N, M.VISIT AS VIS, M.COMMENT1, CASE WHEN COMPRESS(M.COMMENT2) = '<>' THEN ' ' ELSE M.COMMENT2 END AS COMMENT2 FROM PERM.MARGINCO AS M, CLOS AS C WHERE M.SITE = C.SITE AND M.RANDID = C.RANDID ORDER BY M.SITE, M.RANDID, TABLE_NA; QUIT; A sample of the COMM data set OBS SITE SCREENID RANDID TABLE_NA COLUMN_N VIS COMMENT2 1 01 0003 3914 HEM NEUT_NM BM 2 01 0003 3914 XRAY XRYHR_NM D180 3 07 0003 5334 ALIQUOT GROUP_NM A01 IMPROVED 4 07 0003 5334 BLOODGAS BDATE_YY BG1 5 07 0003 5334 CHEM ALTUN BM 6 07 0003 5334 CHEM ALTUN T4D 5

PROC SQL Syntax to Create Views Creating views is same as data sets. The purpose of using views is to reduce the real time required to complete a job by eliminating one or more I/O bound segments. If a fortyminute DATA Step that takes only ten minutes of CPU time can be converted into a DATA Step view, the potential real time savings for the entire job could be as much as thirty minutes. In addition to time savings, SAS data views can reduce the peak disk space requirements for a given job by reducing the redundant copies of data required to be held on disk at any given instant. If you process vast volumes of data, using SQL and DATA Step views may cut significant percentages off the real time for your large SAS jobs. The Syntax for creating view is same the creating the data set. See below for example data numbers; infile cards; input number @@; cards; 2 3-4 2.1-2.2 6-34 0 ; run; proc sql; /* Create a view with 3 additional variables: */ /* NEGATIVE is 1 if NUMBER is negative, otherwise it's 0. */ /* ZERO is 1 if NUMBER is zero, otherwise it's 0. */ /* LOG is the log of the absolute value of NUMBER. */ /* There will be one observation in WITHLOG for each one in */ /* NUMBERS. */ create view withlog as select numbers.number as number, (sign(number) = -1) as negative, (number=0) as zero, log(abs(number)) as log from numbers; PROC SQL Syntax to Create Data Listings: For creating data listings, we need to follow the below steps: 1) Create data set with required variables by using proc sql 2) Use that created data set and use proc print or proc report to produce the listings. 6

See below for example: data dads; input famid name $ inc ; cards; 2 Art 22000 1 Bill 30000 3 Paul 25000 ; run; data faminc; input famid faminc96 faminc97 faminc98 ; cards; 3 75000 76000 77000 1 40000 40500 41000 2 45000 45400 45800 ; run; proc sql; create table dadfam1 as select * from dads, faminc where dads.famid=faminc.famid order by dads.famid; quit; proc print data=dadfam1; run; PROC SQL Syntax to Create Macro Variables Another useful property of PROC SQL is its ability to create macro variables. PROC SQL allows the programmer to concatenate summary information from multiple data groupings into a single macro variable. The PROC SQL statements for creating macro variables are nearly identical to those for creating data sets. PROC SQL <NOPRINT>; SELECT expression1 <, expression2 <, expression3>> INTO :macro-variable1< -:macro-variablen> <, :macro-variable2 <, :macrovariable3>> <SEPARATED BY separating character string> FROM input-dataset1, <input-dataset2, <input-dataset3>> <WHERE expression > <GROUP BY variable <, variable >> <ORDER BY variable <, variable >; QUIT; 7

Standard SAS naming rules apply to macro variables created by PROC SQL. As of Version 6.12, macro variable names may be up to 8 characters in length. The NOPRINT statement is optional. If omitted, the values stored in the macro variables will also be part of the printed output. This section only covers the SELECT, INTO, and SEPARATED BY statements. The other statements are used in the same manner as they are for creating data sets. SELECT expression The expression used on the SELECT statement can be any of the following: a variable in the input data set, a valid SAS formula or function combining variables and/or summary functions, the result of a summary function, or any valid combination of variables, formulas, functions, and summary functions. SELECT MEAN(VAL) SELECT PUT(COUNT(DISTINCT RANDID),3.) SELECT (I - MEAN(I))**2 SELECT SCAN(VNAME,1,'_') ' = INPUT(' TRIM(VNAME) ', 8.)' SELECT DISTINCT DSNAME INTO :macro-variable <SEPARATED BY separating character string> The INTO statement is used to name the macro variables that will contain the results of the SELECT statement. Each macro variable name in the list must be immediately preceded by a colon (:). Multiple rows of output can be concatenated into a single macro variable by using the SEPARATED BY statement. When multiple rows of output are concatenated into a single macro variable, the SEPARATED BY statement is used to provide a character string that will be used to delimit the individual values of that concatenation INTO :MEANX, :STDX, :NX INTO :NTRT1-:NTRT&ngrps INTO :MVLIST SEPARATED BY ' =.; ' INTO :NAMES SEPARATED BY ' ', :VAL1-:VAL3 CONCLUSION Based on above presentation, We know the below items by using proc sql Create data sets, data views Create data listings Create macro variables 8

REFERENCES: SAS Institute Inc., Getting Started with the SQL Procedure, Version 6, First Edition SAS Institute Inc., SAS7 Guide to the SQL Procedure: Usage and Reference, Version 6, First Edition ACKNOWLEDGMENTS I would like to thank Coders Corner Co-Chairs for accepting my abstract and paper. I also thank to Chauthi Nguyen and Shi-Tao Yeh for their support for presenting this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author: Raja Panchumarthi Swetha Kongara SAS Certified Professional PVR Technologies Inc Smith Hanley Consulting Group Inc 350 Parsippany Rd, Suite #70 E-mail: panchumarthi@yahoo.com Ph: 510-691-1490 Parsippany,NJ-07054 E-mail:swetha@pvrtech.com Ph:973-885-4712 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 9