Getting Your Data into SAS Stephanie R. Thompson Where is your data stored? Oracle tables SQL Server tables Microsoft Access Microsoft Excel Text file SPSS All over the place 1
What s the solution? SAS/ACCESS Interface! What can you access? SAS/ACCESS validation matrix http://support.sas.com/matrix Access to Relational Databases and Data Warehouse Appliances Aster Data ncluster DB2 Greenplum Hadoop HP Neoview Informix Microsoft SQL Server MySQL Netezza ODBC OLE DB Oracle (including Sun Oracle Database Machine [Exadata]) Sybase Sybase IQ Teradata 2
Access to nonrelational databases Operating System Dependent ADABAS CA-DATACOM/DB SAS/Access DATA Step Interface to CA-IDMS IMS PC Files SYSTEM 2000 How do I know what I have licensed? proc setinit noalias ; run; 3
Ways to Access Your Data SAS libname engine PROC SQL ODBC PROC IMPORT Import Wizard Libname Refresher Point to where your data are Think of as shortcuts or nicknames Can be 1 to 8 characters Start with an underscore or letter Cannot start w/a number or other special character 4
Libname General Form LIBNAME libref <engine> physical path ; Libref you provide <engine> is optional Refers to type of data to read Physical path is operating system dependent Libname Engine Connect to relational database tables Oracle, DB2, etc. Connect to other file types Excel, SPSS, etc. SAS treats tables like SAS datasets Need to define the engine Can include ID and PWD 5
Libname Engines Based on what is licensed Some with Base SAS SAS/ACCESS optional engines Some are only for certain platforms Release resource when done Excel Example libname fac 'R:\Employee Data\Employee Database.xls' access=readonly ; data faculty_all(drop = months) ; set fac.'no Costing$'n ; [ add l SAS statements ] ; run; 6
Accessing Databases Databases on PC Servers Use of ID and passwords Additional libname options Non-SAS Oracle Tips Check your IP address Static or dynamic? Determine your IP address: go to a DOS prompt : Start run CMD Type ipconfig/all IP address listed after IP Address Obtain a static IP address for your PC. This will be necessary if your IT department opens the firewall only to specific IP addresses. 7
tnsnames.ora General form (from Oracle website) net_service_name= (DESCRIPTION= (ADDRESS=(protocol_address_information)) (CONNECT_DATA= (SERVICE_NAME=service_name))) Can have multiple services in one file More Non-SAS Oracle Tips Obtain any security clearances that you may need. Security may be by instance, schema, table, table elements Install the Oracle client on your PC Configure the Oracle client to access your data Edit tnsnames.ora for your database specifics 8
Protect your ID and Password! libname st 'C:\My Documents\My SAS Files'; data cred; set st.cred; call symput('stid', stid); call symput('stpwd', stpwd); run; Oracle Using Libname libname mylib oracle user=&stid pw=&stpwd path=rpts.spectrumreports schema=saturn; oracle is the engine how the data are stored 9
Accessing Oracle Tables proc sql; create table registered as select distinct sfrstcr_pidm as pidm from mylib.sfrstcr where [add l SQL statements] order by sfrstcr_pidm; quit; Just Using PROC SQL proc sql noerrorstop; connect to oracle(user=&stid pw=&stpwd path=rpts.spectrumreports); create table regdetail as select * from connection to oracle (select pidm, levl_desc, styp_code from sgbstdn a, sfbetrm where [add l SQL statements] ); quit; 10
Why Use SQL? Takes advantage of SAS SQL Pass- Through Facility Understood by database host system Standardized language No need for translation Benefits of using PROC SQL SQL Pass-Though Facility! Speed!!!!!! 11
Benefits of using PROC SQL (cont d) The Pass-Through Facility uses SAS/ACCESS to connect to a DBMS and to send statements directly to the DBMS for execution. This facility is an alternative to the SAS/ACCESS LIBNAME statement. It enables you to use the SQL syntax of your DBMS, and it supports any non-ansi standard SQL that is supported by your DBMS. (SAS online documentation) Benefits of using PROC SQL (cont d) SQL code can be processed on system hosting the database Increase speed Increase efficiency Take advantage of additional functions 12
Other Notes DATA step code is not passed to Oracle when using the LIBNAME engine PROC PRINT is passed to Oracle as a SELECT query PROC FREQ is sent to Oracle as a select * from table Other Notes (cont d) WHERE criteria will get passed to Oracle SAS functions can be used in PROC SQL (except LAG and DIF) More functionality in SAS 9.2 and beyond Ability to see what is being passed via a trace 13
Using PROC SQL w/odbc proc sql; connect to odbc(dsn=mdsn uid=&usr pwd=&pwd); create table studs as select * from connection to odbc ( [add l SQL statements] ); disconnect from odbc; quit; mdsn is the name of the ODBC connection set up on my PC Setting up an ODBC Connection From Control Panel 14
Setting up an ODBC Connection (cont d) Setting up an ODBC Connection (cont d) Why use? Needed for certain types of databases SQL Server To use the libname engine instead of PROC IMPORT 15
Import File Types Base PROC IMPORT text files.dat tab delimited comma separated delimited Import File Types SAS/ACCESS to PC Files JMP Excel MS Access dbase Lotus 16
PROC IMPORT Allows you to import many types of PC files Add code to your program Use the Wizard Use the Wizard to create code for you PROC IMPORT - Excel PROC IMPORT OUT= WORK.mydata DATAFILE= "H:\Stephanie\Excel\Student_Data.xls" DBMS=EXCEL REPLACE; SHEET="Sheet1$"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; 17
Import Wizard Log Window 1 PROC IMPORT OUT= WORK.mydata 2 DATAFILE= "H:\Stephanie\Excel\Student_Data.xls" 3 DBMS=EXCEL REPLACE; 4 SHEET="Sheet1$"; 5 GETNAMES=YES; 6 MIXED=NO; 7 SCANTEXT=YES; 8 USEDATE=YES; 9 SCANTIME=YES; 10 RUN; NOTE: WORK.MYDATA was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 2.37 seconds cpu time 0.03 seconds 18
Flat Files Delimited Import Wizard Data Step Flat File (non-delimited) Data step Know your data! Make it Easy on Yourself 19
References Absolute Preceded by the @ symbol Indicate a specific column to begin reading in data Sample line input @1 week 2. @5 home $2.; References (cont d) Relative Relative to current location Can be positive or negative Sample line input week 2. +2 home $2.; 20
Tips Know Thy Data! Understand record layout for flat files Libname engine vs. import Depends on what you need to do What you have licensed Tips (cont d) Database Views Contain pre-combined sets of information Can cover many different base tables May include pre-calculated variables Can be resource intensive 21
Tips (cont d) Tables Learn table structures One record per key Multiple records per key Use validation tables Know keys needed to join tables Single? Many? Tips (cont d) Data step vs. PROC SQL Both will work Data step runs much more slowly System dependent Try both and see which is faster Data step only w / libname engine No SQL pass-through w / data step 22
Enterprise Guide Getting data into EG Questions? 23
Thank You! 24