Advanced Tutorials. The Dark Side of the Transparent Moon

The Dark Side of the Transparent Moon -- Tips, Tricks and Traps of Handling ORACLE Data Using SAS Xinyu Ji, Fallon Clinic, Worcester, MA ABSTRACT The paper discusses tips, tricks and traps of handling ORACLE data using SAS. It covers, among others, the following: how to extract summary information of ORACLE data contents; the difference between ORACLE character fields and SAS character variables; how WHERE clauses under the LIBNAME engine method are passed to ORACLE; selected SAS functions and their ORACLE SQL counterparts; how to deal with ORACLE date values in SAS; how to debug errors in SQL pass-through code. INTRODUCTION SAS currently provides two options for connecting to ORACLE: the SQL Pass-Through Facility and the LIBNAME engine method. While in many cases, the processing of ORACLE data in SAS is straightforward and pass-through transactions are transparent, there is certainly a dark side where we need to be careful of traps. This paper explores that dark area, and provides tips and tricks for handling ORACLE data using SAS. For the purpose of this study, we assume that there is an ORACLE database account for the user "samji", and the associated password is "nesug16". The data of interest reside in the path called "nesug". The user owns a schema called "samji" and the public schema is "sasusers". All the code in this paper have been tested with the Release 8.2 of the SAS System for Window OS, connected to Oracle8i, and the Release for SQL*Plus is 8.1.6. Be aware that results might be different if you are under a different version of SAS, DBMS and/or operating environment. A QUICK FIRST GLANCE The following code gives users a quick first glance at the objects stored at the path "nesug". Note that if no schema information is supplied in the LIBNAME statement, then the default schema will be used, which might be the one the user owns, namely, "samji". options nodate pageno=1; libname nesug ORACLE user=samji pw=nesug16 path='nesug' schema=sasusers; proc contents data=nesug._all_ position; The first pages of the output provide directory information, listing, in alphabetic order, all the objects in the public schema in "nesug" database that the user has access to. The following pages contain, for each object, a list of variables and attributes. The LIBNAME engine method allows us to work with the ORACLE tables/views as if they were SAS datasets (really?), and as a result, the TYPE attribute column tells us what SAS data type a variable has, which is either CHARacter or NUMeric. This information usually suffices as a first quick glance. But the next example shows that sometimes it is necessary for us to know the ORACLE data type of a field. CHAR OR VARCHAR2? WHAT ARE YOU TALKING ABOUT?! Well, we are not talking about SAS data types, but ORACLE data types. In a SAS dataset, there is only one type of text string variable, but in an ORACLE database, a text string field can be a CHAR type or a VARCHAR2 type. A field defined as CHAR(n) will be filled with trailing blanks up to the length of n if the character value assigned to it is shorter than n; while the character value will be assigned intact to a VARCHAR(n) type field, even if the value is shorter than n. When a CHAR type field is compared to expressions of another CHAR type field, or text literals, ORACLE will use the so-called "Blank-Padded Comparison Semantics", which, before making any comparisons, first adds blanks to the end of the shorter text values to make the lengths of the comparison values equal. On the other hand, when one or both values in the comparison are of the VARCHAR2 type, ORACLE uses "Non-Padded Comparison Semantics", which makes no effort to equalize the lengths of the comparison values. So what have all these to do with SAS? Let's see an example. Suppose a table stored in "samji" schema in "nesug" path is created with the following code: 1

execute (CREATE TABLE samji.test (conference VARCHAR2(8))) by oracle; execute (INSERT INTO samji.test VALUES ('IEEE2003')) by oracle; execute (INSERT INTO samji.test VALUES ('NESUG16 ')) by oracle; Since the field "conference" is defined as VARCHAR(8), both the value "IEEE2003" and the value "NESUG16 " will be inserted intact into the field. That is, the first row of the table contains a value "IEEE2003" without trailing blanks, and the second row of the table contains a value "NESUG16" plus one trailing blank. Now what result the following SAS code will produce? ( from samji.test where conference='nesug16'); The SAS temporary dataset "test" will contain 0 observations, because ( from samji.test where conference='nesug16') is passed to ORACLE for processing, where the field "conference" is defined as VARCHAR2(8). Two things happen because of this attribute: first, when the value "NESUG16 " is inserted into the field, the trailing blank is retained; second, when the field is compared to "NESUG16", "Non-Padded Comparison Semantics" is used. As a result, ORACLE sees "NESUG16 " and "NESUG16" as unequal and returns 0 rows. It will return 1 row if you pass ( from samji.test where conference='nesug16 '). AS IF THEY WERE SAS DATASETS In the above we see an example where results might be unexpected from our SAS perspective, when we pass through SQL code to ORACLE for processing. So what about the LIBNAME engine method? Will that provide a foolproof workaround? If the LIBNAME engine method does allow us to work with the ORACLE tables/views as if they were SAS datasets, then CHAR vs. VARCHAR2 data type difference will be a non-issue in SAS, because we know that trailing blanks do not matter at least in the following SAS code: data test1; length conference $ 8; conference='ieee2003'; output; conference='nesug16 '; output; data test2; set test1; where conference='nesug16'; data test3; set test1; where conference='nesug16 '; So, you run the code below, thinking that both "test1" and "test2" will contain one observation. libname nesug ORACLE user=samji pw=nesug16 path='nesug' schema=samji; data test1; 2

set nesug.test; where conference='nesug16'; data test2; set nesug.test; where conference='nesug16 '; Yet, both SAS datasets contain 0 observations. To make things even messier, you find that if you replace "where" with "if" in your SAS code, then the new SAS datasets will both contain one observation, where the value of the variable "conference" is "NESUG16". You thought you knew the difference between the WHERE clause and IF clause, but never expected a difference like this, right? Unfortunately, the claim that the LIBNAME engine method allows us to work with the ORACLE tables/views as if they were SAS datasets is not 100% correct, but fortunately SAS provides useful debug tools to deal with that. Levine (2001) discusses a SAS/ACCESS LIBNAME feature called WHERE Clause Optimizer, which passes SAS WHERE clauses down to the underlying DBMS for processing. To determine if and how our code above has been passed to ORACLE, we could use the SAS system options SASTRACE and SASTRACELOC. The former generates trace information from a DBMS engine, and the latter prints trace information to a specified location. To direct trace messages to a SASLOG, we could issue: options sastrace=',,,d' sastraceloc=saslog; Below are some trace excerpts generated by the SASTRACE option: 4 data test1; 5 set nesug.test; 6 where conference='nesug16'; 7 DEBUG: PREPARE SQL statement: 101 1344372249 no_name 0 DATASTEP SELECT "CONFERENCE" FROM samji.test WHERE ("CONFERENCE" = 'NESUG16' ) 102 1344372249 no_name 0 DATASTEP NOTE: There were 0 observations read from the data set NESUG.TEST. WHERE conference='nesug16'; DEBUG: Close Cursor - CDA=102315840 103 1344372249 no_name 0 DATASTEP NOTE: The data set WORK.TEST1 has 0 observations and 1 variables. 8 data test2; 9 set nesug.test; 10 where conference='nesug16 '; 11 DEBUG: PREPARE SQL statement: 107 1344372249 no_name 0 DATASTEP SELECT "CONFERENCE" FROM samji.test WHERE ("CONFERENCE" = 'NESUG16' ) 108 1344372249 no_name 0 DATASTEP NOTE: There were 0 observations read from the data set NESUG.TEST. WHERE conference='nesug16 '; DEBUG: Close Cursor - CDA=102315840 109 1344372249 no_name 0 DATASTEP NOTE: The data set WORK.TEST2 has 0 observations and 1 variables. 12 data test3; 13 set nesug.test; 14 if conference='nesug16'; 15 NOTE: There were 2 observations read from the data set NESUG.TEST. DEBUG: Close Cursor - CDA=102315840 113 1344372249 no_name 0 DATASTEP 3

NOTE: The data set WORK.TEST3 has 1 observations and 1 variables. Both the first and the second DATA step pass SELECT "CONFERENCE" FROM samji.test WHERE ("CONFERENCE" = 'NESUG16') to ORACLE regardless of whether we ask for where conference='nesug16' or where conference='nesug16 ' in the SAS DATA step, because before SAS makes the pass-through, it has already trimmed the trailing blanks after 'NESUG16', as SAS does to any such comparisons in a SAS DATA step. Notice that no code is passed to ORACLE when the subsetting IF statement is used instead. The ORACLE data are read in SAS, and the subset with one observation is created in SAS. To ensure that ORACLE data processing and SAS datasets processing generate consistent results through the LIBNAME engine method connection, we could turn off the Query Optimization feature as in the following: libname nesug ORACLE user=samji path='nesug' schema=samji pw=nesug16 DIRECT_SQL=(NONE); If you define the library "nesug" in this way, and re-run the code with WHERE clause, you will see no code passed to ORACLE, and the subset with one observation will be correctly created. Now let's go back to the ORACLE table "test". The data probably need to be cleaned in the first place in ORACLE to trim the trailing blanks for "conference" field, which has been defined as VARCHAR2(8) type. Yet the example provides us with a new perspective concerning data quality. In fact, I was prompted to investigate the difference between SAS text data type and ORACLE text data type when a WHERE clause returned to me a much smaller subset than expected during a work assignment. A first step in the investigation probably is to find out whether the field in question is a CHAR type or a VARCHAR2 type. But CHAR OR VARCHAR2? HOW DO I KNOW? We know from our first example that if we connect to ORACLE through the LIBNAME engine method, then we can execute the CONTENTS procedure against an ORACLE table/view. The output will list the data type for each field in the ORACLE data. However, we can only tell whether a field is a character variable or a numeric variable. SAS will not distinguish between CHAR type and VARCHAR2 type for character variables. For that piece of information, we need to dig into ORACLE Data Dictionaries. Views whose names begin with "USER_" are dictionaries containing information about tables/views owned by a user, while views whose names begin with "ALL_" are dictionaries containing information about tables/views accessible by a user. The following SQL pass-through code extracts column name, data type, and data length for all columns in the table "test" in the schema "samji" from the ORACLE Data Dictionary "USER_TAB_COLUMNS". The values for character variables in those dictionaries are by default in upper case. options pageno = 1; (select column_name, data_type, data_length from user_tab_columns where table_name='test' order by column_name); The following code extracts the same information, but from a different ORACLE Data Dictionary, "ALL_TAB_COLUMNS". Since this view contains information not only for tables that the user "samji" owns, but also tables that the user does not own yet has access to in the WHERE clause, we need to specify both table name and the owner of the table (in this case schema "samji"), in case that there is another table/view with the same name in the public schema. Both blocks of code will show that the "conference" field is of VARCHAR2(8) type. options pageno = 1; 4

(select column_name, data_type, data_length from all_tab_columns where owner='samji' and table_name='test' order by column_name); Now suppose you find out "conference" is a VARCHAR2(8) type field and you know for sure its values contain unnecessary trailing blanks. There are several ways to circumvent the data flaws, but taking efficiency (in the aspect of limiting the number of rows returned to SAS from the DBMS) into consideration, are the following two blocks of DATASTEPs equally efficient? libname nesug ORACLE user=samji pw=nesug16 path='nesug' schema=samji; data test1; set nesug.test; where trim(conference)='nesug16'; data test2; set nesug.test; where trimn(conference)='nesug16'; The only difference between the above two DATASTEPs seems to be that the first one uses TRIM function and the second uses TRIMN. TRIM and TRIMN both remove trailing blanks from a character expression, but TRIM returns one blank if the expression is missing while TRIMN returns a null string if the expression is missing. A further investigation into messages generated by SASTRACE reveals that for the first DATASTEP: DEBUG: PREPARE SQL statement: 60 1344778904 no_name 0 DATASTEP SELECT "CONFERENCE" FROM samji.test 61 1344778904 no_name 0 DATASTEP LOG messages for the second DATASTEP: DEBUG: PREPARE SQL statement: 66 1344778904 no_name 0 DATASTEP SELECT "CONFERENCE" FROM samji.test WHERE ( RTRIM("CONFERENCE") = 'NESUG16' ) 67 1344778904 no_name 0 DATASTEP When TRIM is used, WHERE Clause Optimizer passes down SELECT "CONFERENCE" FROM samji.test to ORACLE. The result set is then returned from DBMS to SAS, and there the criterion of where trim(conference)='nesug16' is applied, and qualifying rows are output to SAS temporary dataset "test1". When TRIMN is used, WHERE Clause Optimizer passes down SELECT "CONFERENCE" FROM samji.test WHERE ( RTRIM("CONFERENCE") = 'NESUG16' ) to ORACLE. Only rows satisfying the condition in the WHERE clause are returned to SAS and output to temporary dataset "test2". The second DATASTEP limits the number of rows returned to SAS from the DBMS, and therefore is more efficient. Note that when WHERE Clause Optimizer makes the pass-through, it translates SAS function TRIM to ORACLE SQL function RTRIM. Which leads us to the discussion of SAS FUNCTIONS AND THEIR ORACLE SQL COUSINS 1 When we use the LIBNAME engine method to access ORACLE data, a feature called SQL query optimization tries to offload as much as possible processing that normally would occur in SAS to the DBMS. These optimizations might occur in either WHERE clauses (as we have seen in several examples above), or PROC SQL. If SAS functions are used in WHERE Clauses or PROC SQL, SAS/ACCESS first tries to either directly pass or translate them to ORACLE for processing. Only when the attempt fails, does SAS process them in SAS itself. To fully take 1 This section does not mean to be a complete review of SAS functions and their ORACLE SQL counterparts. Only selected functions are discussed here. 5

advantage of the efficiency improvement of passing processing to the underlying DBMS, It is important to know which SAS functions get passed or translated under the LIBNAME engine method. In PROC SQL, aggregate functions MIN, MAX, AVG, MEAN, FREQ, N, SUM, COUNT are SQL ANSI-defined. Therefore they are directly passed to ORACLE. If other SAS functions are called in PROC SQL or WHERE clauses, SAS/ACCESS first tries to translate them into their ORACLE-specific equivalents. If the translation is successful, then the processing is passed to ORACLE. Below is a list of SAS functions, which SAS/ACCESS passes to ORACLE for processing (ORACLE SQL equivalent in parenthesis if its ORACLE SQL name is different from the SAS name): Arithmetic Functions: ABS, SIGN, SQRT Character Functions: LOWCASE (LOWER), SOUNDEX, TRIMN (RTRIM), TRANSLATE, UPCASE (UPPER) Date and Time Functions: DATETIME (SYSDATE) Mathematical Functions: EXP, LOG, LOG10 (LOG), LOG2 (LOG) Trigonometric and Hyperbolic Functions: ARCOS (ACOS), ARSIN (ASIN), ATAN, COS, COSH, SIN, SINH, TAN, TANH Truncation Functions: CEIL, FLOOR If a function cannot be translated by SAS/ACCESS into an ORACLE-specific function, then SAS retrieves all the rows from the DBMS and process them in SAS. In the list above, we find that quite a few SAS functions share the same names with their ORACLE SQL equivalents. However, there are some other pairs of functions that share the same name in SAS and in ORACLE SQL, perform the same functionality, and have the same syntax, yet SAS/ACCESS LIBNAME engine does not translate them into ORACLE SQL functions. SUBSTR (right of = ) is an example: 51 data test; 52 set nesug.test; 53 where substr(conference,1,5)='nesug'; 54 DEBUG: PREPARE SQL statement: 79 1344792665 no_name 0 DATASTEP SELECT "CONFERENCE" FROM jixxi01.test 80 1344792665 no_name 0 DATASTEP In this case, using the SQL Pass-Through Facility as follows is more efficient than LIBNAME engine connection method. Note that the ORACLE SQL function SUBSTR instead of the SAS function SUBSTR has been called: ( from samji.test where substr(conference,1,5)='nesug'); In addition to the function SUBSTR (right of = ), LENGTH, MOD and REVERSE fall into the same category. To take advantage of the performance improvement of passing processing to ORACLE, we should, whenever possible, use the SQL Pass-Through Facility and call ORACLE SQL functions with the same name, instead of calling them as SAS functions in either WHERE clauses or PROC SQL under the LIBNAME engine method. However, users be aware of the following two subtle differences between SAS LENGTH and ORACLE SQL LENGTH: first, SAS function LENGTH returns an integer representing the position of the right-most nonblank 6

character in the argument, while ORACLE SQL function LENGTH returns the number of characters in the string argument including trailing blanks; second, SAS function LENGTH returns a value of 1 if the value of the argument is missing, while ORACLE SQL function LENGTH returns null if the argument is null. There are two other pairs of functions for which SAS and ORACLE SQL share the same names. I did not include them in the same category as SUBSTR, LENGTH, and MOD, because of the considerable syntax differences between the SAS ones and the ORACLE SQL counterparts. The first is the ROUND function. In SAS, ROUND(n,m) returns the value n rounded to the nearest round-off unit, represented by m, while in ORACLE SQL, ROUND(n,m) returns n rounded to m places to the right of the decimal point. m can be negative to round off digits left of the decimal point, but m must be an integer. The second is the TRIM function. You might recall that when TRIMN is called in a WHERE clause or PROC SQL under the LIBNAME engine method, SAS/ACCESS tries to translate SAS TRIMN to ORACLE SQL RTRIM. Therefore, if we make the connection to ORACLE with the SQL Pass-Through Facility, then we could call the ORACLE SQL function RTRIM in the pass-through code to accomplish the tasks done with TRIMN in SAS. If the TRIM function is used in a WHERE clause or PROC SQL under the LIBNAME engine method, SAS/ACCESS does not translate TRIM to any ORACLE SQL functions. However, in ORACLE SQL there is also a TRIM function: TRIM( [ [ [<trim_spec>] char ] FROM ] string ). <trim_spec> indicates a specification of either LEADING, or TRAILING, or BOTH. char represents a single character to be trimmed. string is the target string to be trimmed. When <trim_spec> is omitted, BOTH are implied, and when char is omitted, a space character is implied. Therefore, in ORACLE SQL, TRIM(string) is the equivalent of TRIM(LEFT(string)) in SAS. That is why SAS TRIMN is translated into ORACLE SQL RTRIM, because RTRIM(string [, char]) returns string, with its right-most characters removed following the last character that is not in the argument char. When char is omitted, a space character is implied, which equates RTRIM(string) in ORACLE SQL with TRIMN(string) in SAS. And you guessed it right that there is also an LTRIM function in ORACLE SQL that mirrors RTRIM. 86,400! THAT ESOTERIC NUMBER So you have been asked to go to a huge ORACLE table containing 10 years of transactions and extract only transactions occurring on the Christmas Eves of each year. You have no clue what to do, but fortunately you have a piece of legacy code to accomplish the task. So following the code, you load the entire ORACLE table into SAS, pick out the variable capturing the time of each transaction, divide it by 86,400, apply MONTH function to get the month, apply DAY function to get the day of the month, use a subsetting WHERE clause, and finally you get a perfect sample for Christmas Eves. You, for yet another time, are confirmed in your belief that computers are strange animals -- from time to time you need to feed them with some unlikely food, such as an esoteric number 86,400 to get things done. This section discusses how to handle ORACLE dates and times with SAS. Many of us have already felt frustrated dealing with dates and times in SAS. However adding ORACLE to the scenario does not necessarily make things even worse. With the help of some ORACLE date functions, life might be easier for us. In SAS, we actually do not have a separate data type for date variables. The so-called SAS date variables are just numeric variables representing SAS date values. Unless they are properly formatted, they are represented as the number of days since Jan. 1, 1960. When we need to read date values from external data sources such as ASCII files into SAS, we first have to find out how the date values are displayed. Most likely it is not the number of days since Jan. 1, 1960. But rather, the external files have almost uncountable ways to represents a date value of Apr. 1, 2003. It might look like 20030401, or 04012003, or 4/1/03, or Apr-1-2003, or APR0103, etc, etc. After we find out how the date values are represented, we normally have two options to read them into SAS properly. We could either read them as character fields first and convert them into SAS date values (the number of days since Jan. 1, 1960) later using a date informat, or we could directly read them as numeric variables representing SAS date values using a date format. Our success hinges on finding the appropriate date informats or date formats. If the raw date values are not displayed in a way for which SAS provides pre-defined informats or formats, then you need to do further processing before SAS can recognize them as date values. However, processing is much easier when reading ORACLE date values into SAS. No matter whether you use the SQL Pass-Through Facility or the LIBNAME engine method, SAS can recognize the ORACLE date columns and automatically read them into SAS as numeric variables with DATETIME20. as both its informat and format. The resulting SAS variables are displayed as ddmmmyyyy:hh:mm:ss, and as numeric variables, their values represent the number of seconds since midnight, Jan. 1, 1960. For example, Apr. 1, 2003 looks like 01APR2003:00:00:00. Since 24 hours amount to 86,400 seconds, dividing by 86,400 returns to us the number of days since Jan. 1, 1960. It has the same effect as calling the DATEPART function, although the dividing-by-86400 approach has the added benefits of enhancing job security. 7

Now let's go back to the example presented at the beginning of this section. The inefficiency of the approach is very obvious, because we read into SAS the data of the other 3,640 or so needless days. It is more efficient to ask ORACLE to only return the data for the 10 days we need back to SAS. However, under the LIBNAME engine method, the SAS function MONTH and DAY in WHERE clauses will not be translated and passed into ORACLE for processing. Therefore, SAS has to first read the entire ORACLE table before the subsetting WHERE clause gets processed. On the other hand, the SQL Pass-Through Facility can return to SAS only needed rows, but the subsetting has to be done in the ORACLE way, which requires us to know the ORACLE SQL counterparts for the SAS function MONTH and DAY. In ORACLE, there are several ways to achieve this. One helpful SQL function is TO_CHAR(date number [, format]). TO_CHAR converts a date or a number to a value of the VARCHAR2 data type using the optional format. Unlike SAS, ORACLE has one data type for number and another separate data type for date. In this section, we will focus on converting dates, as converting numbers is more straightforward. In ORACLE, whenever a date value is displayed, ORACLE automatically calls the TO_CHAR function with the default date format mask, usually DD-MON-YY. You could override the default format with the format string in the TO_CHAR function. For example, suppose "tran_dt" is an ORACLE date field whose value is "24-Dec-2002 19:56:37", then: TO_CHAR(tran_dt,'YYYY') returns '2002' TO_CHAR(tran_dt,'YY') returns '02' TO_CHAR(tran_dt,'Q') returns '4' TO_CHAR(tran_dt,'MONTH') returns 'DECEMBER' TO_CHAR(tran_dt,'MON') returns 'DEC' TO_CHAR(tran_dt,'MM') returns '12' TO_CHAR(tran_dt,'DD') returns '24' TO_CHAR(tran_dt,'DY') returns 'TUE' TO_CHAR(tran_dt,'HH') returns '07' TO_CHAR(tran_dt,'HH24') returns '19' TO_CHAR(tran_dt,'MI') returns '56' TO_CHAR(tran_dt,'SS') returns '37' Bear in mind that essentially TO_CHAR is for getting date values out of ORACLE and displaying them as specified by the format string, so guess what returns to you with the following: TO_CHAR(tran_dt, ':) DY MON DD, YYYY???!!!!')? So we have just discussed how to get date values out of ORACLE, but how about getting date values into ORACLE? For example, if you only need to extract transactions occurring on Dec. 24, 2002, then under the LIBNAME engine method, you could subset the data with where tran_dt='24dec2002'd. But if we are using the SQL Pass-Through Facility, then passing down where tran_dt='24dec2002' to ORACLE may or may not work. When a statement like the above is passed to ORACLE, ORACLE expects the expression in the quotation marks to be a date value, because we ask ORACLE to compare the expression with a date field. Whenever ORACLE expects a date value, it automatically calls the TO_DATE function with the default date format mask. Therefore only when '24DEC2002' is in agreement with the default will the WHERE clause work correctly. A safer approach is of course to explicitly call TO_DATE(char, format). The valid format strings here are the same as those that can be used in the TO_CHAR function discussed earlier, and the char string should match the format string. So, you can pass this to ORACLE: where tran_dt=to_date('24dec2002','ddmonyyyy'). In SAS, a date variable is virtually a numeric variable, therefore you could add or subtract a constant to or from it to get another date value. In ORACLE, although we have one data type for date values and a separate data type for numeric values, we still can perform date arithmetic as we do in SAS. For example, the following code will return all rows loaded into the ORACLE table during the past 24 hours to SAS, assuming that the time a row is loaded to a table is captured by the field "row_added_dt": ( from sasusers.transactions 8

where row_added_dt + 1 > sysdate); MISCELLANEOUS This section briefly discusses several other issues related to processing ORACLE data with SAS. I think that the most important issue that I have not yet covered till now is the difference between ORACLE null values and SAS missing values. There is an excellent technical paper available at the SAS Institute's web site on this topic written by Levine (online paper). Many times, we want ORACLE to return to SAS only a small portion of an entire ORACLE table, so that we can run some test code. A couple of ways to accomplish this: If the test sample needs not be randomly selected from the entire table, then we could simply subset with the ORACLE table's pseudo-column ROWNUM like the following: ( from sasusers.transactions where rownum <= 1000); However, if you need a random sample, then you could try sample (p) clause or sample block (p) clause, where p denotes a percentage range between 0.000001 and 100. The following code generates a 1% random sample: ( from sasusers.transactions sample block(1)); On the SAS Institute's web site, there are a couple of technical support notes for Release 8 of the SAS System warning of the bugs in data retrieval through the LIBNAME engine method connection. Quite a few cases involve the usage of WHERE clauses. The web addresses for some of these technical notes are listed below. They might suggest that the SQL Pass-Through Facility is currently more robust than the LIBNAME engine method in connecting SAS to ORACLE: http://www.sas.com/techsup/download/hotfix/v82/base/82ba63/82ba63.html; http://www.sas.com/techsup/download/hotfix/82_sbcs_prod_list.html#003976; http://www.sas.com/techsup/download/hotfix/82_sbcs_prod_list.html#004451 LET THERE BE LIGHT If you frequently work with ORACLE data using SAS SQL Pass-Through Facility, I am sure that more often than not you feel yourself being driven mad by those clueless crimson "ORACLE prepare error" messages. You feel as if you are walking in a pitch-dark night, and wish SAS could throw more light on what is going wrong. Many of us who have the same experience have found SQL*PLUS a good alternative debug tool. It is certainly not the answer to everything, but many times it is very helpful. Let's consider an example as simple as the following: 9

(select conference as thisname, conference as thatname, conference as group, conference as analias, conference as yetanotheralias from samji.test); You see a flash of crimson in the SAS log and it gives you ERROR: ORACLE prepare error: ORA-00923: FROM keyword not found where expected. SQL statement: select conference as thisname, conference as thatname, conference as group, conference as analias, conference as yetanotheralias from samji.test. You are surely lost because you think you do have the FROM keyword at the right place. But if we select and copy the code following "SQL statement:" in the SAS error message, paste it into SQL*PLUS, and run it, SQL*PLUS will produce the following: select conference as thisname, conference as thatname, conference as group, confe * ERROR at line 1: ORA-00923: FROM keyword not found where expected The nice little thing that SQL*PLUS does is to put an "*" at the place that is suspected to cause the problem. When we notice that the alias "group" has been singled out by SQL*PLUS LOG, even if we still don't know why, we can try a tactic such as changing the alias into "group1". If it works, then we can ask our DBAs the very specific question "why 'group' can't be used as an alias in ORACLE". It turns out the word "group" is reserved in ORACLE. If you do want to use it as an alias, you have to put it in double quotation marks: conference as "group". Of course, code working in SQL*PLUS does not guarantee that statements won't fail when SAS passes them down to ORACLE. For example SQL*PLUS has a command called DESCRIBE which prints a list of all columns in a table/view along with their associated data type and length. However, the following SAS code will generate an error message: ERROR: ORACLE execute error: ORA-00900: invalid SQL statement. execute (DESCRIBE samji.test) by oracle; The error occurs because SAS SQL Pass-Through Facility passes code to ORACLE SQL*NET, not SQL*PLUS. If a command or a statement is specific to SQL*PLUS, but not part of the base ORACLE SQL database language, then it is invalid to the ORACLE SQL*NET layer. Nevertheless, SQL*PLUS in many cases is still a helpful debug tool. REFERENCE Levine, F. (online paper). "Potential Result Set Differences between Relational DBMSs and the SAS System". http://www.sas.com/rnd/warehousing/papers/resultsets.pdf Levine, F. (2001). "Using the SAS/ACCESS Libname Technology to Get Improvements in Performance and Optimizations in SAS/SQL Queries". Proceedings of the Twenty-sixth Annual SAS Users Group International Conference, 26. 10

ACKNOWLEDGEMENTS Many thanks to Becky Ikehara, the Database Administrator at Fallon Clinic, who, despite my incessant emails with ORACLE related puzzles, always answers my questions promptly and patiently. I also wish to thank the SAS Technical Support team for its invaluable assistance. SAS and all other SAS Institute Inc. product or services are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Xinyu Ji Fallon Clinic 100 Front Street, Worcester Office Tower, 14 th Floor Worcester, MA 01608 (O) 508-368-5487 Email: Xinyu.Ji@fallon-clinic.com 11