Using SAS ACCESS to retrieve and store data in relational database management systems Department of Biology Dalhousie University SHRUG meeting, February 23rd 2007
Disclaimer Background SAS is the only proprietary software that I use I use SAS to fit models, and for nothing else I use SAS for Unix running on a variety of Unices My computing toolkit Linux OS (Ubuntu and CentOS) PostgreSQL with postgis unixodbc The R software environment for statistics and graphics The Generic Mapping Tools (GMT) The usual suspects (Emacs, Perl, shell scripts)
Background What I do Fisheries research Models of fish population dynamics to determine past, present and future abundance and biomass. The catch rate is used to estimate abundance and biomass. Counting fish in the ocean is like counting trees in the forest, except that you can t see them and they move.
Background What I do Fisheries research Models of fish population dynamics to determine past, present and future abundance and biomass. The catch rate is used to estimate abundance and biomass. Counting fish in the ocean is like counting trees in the forest, except that you can t see them and they move.
Background What I do Fisheries research Models of fish population dynamics to determine past, present and future abundance and biomass. The catch rate is used to estimate abundance and biomass. Counting fish in the ocean is like counting trees in the forest, except that you can t see them and they move.
Background Relational Database Management System (RDBMS) What? Structured Query Language (SQL) Create tables and load data Access data through SQL queries Oracle, DB2, MS Access, PostgreSQL, mysql,... Why? Indexing of data and optimisation of queries to perform retrievals Provides a neutral layer that can be used by most analytical software through database bindings OpenGIS Simple Features Specification for SQL - to define geospatial data and perform queries with spatial operators
Background Relational Database Management System (RDBMS) What? Structured Query Language (SQL) Create tables and load data Access data through SQL queries Oracle, DB2, MS Access, PostgreSQL, mysql,... Why? Indexing of data and optimisation of queries to perform retrievals Provides a neutral layer that can be used by most analytical software through database bindings OpenGIS Simple Features Specification for SQL - to define geospatial data and perform queries with spatial operators
SAS ACCESS Background Database connectivity to Oracle, DB2 and ODBC PROC SQL with pass-through facility PostGIS runs on PostgreSQL No support for PostgreSQL or mysql However, Open Source RDBMS all come with ODBC drivers SAS/ACCESS(R) 9.1.3 for Relational Databases: Reference, Second Edition
ODBC Background Open Database Connectivity (ODBC) provides a standard software API method for using database management systems
Background Getting data from a RDBMS into SAS Assuming that you have an ODBC Data Source Name (DSN) called DSNentry, the following code will allow you to define the RDBMS as a SAS libref. libname myrdbmslib odbc USER= username DATASRC= DSNentry SCHEMA= public ;
Background Getting data from a RDBMS into SAS (cont.) Assuming that you have an ODBC DSN called DSNentry, the following code using PROC SQL pass-through facility will allow you to get data directly from the RDBMS. proc sql; connect to odbc (user=username datasrc=dsnentry); create table mydata as select * from connection to odbc ( SQL QUERY GOES HERE ); disconnect from odbc; quit;
Background Moving data out of SAS and into a RDBMS libname gfsdb odbc USER= postgres DATASRC= gfsdb SCHEMA= public ; PROC GENMOD data=mydata; ods output ParameterEstimates = myestimates;... RUN; PROC SQL; CREATE TABLE gfsdb.isoest AS SELECT * FROM myestimates; QUIT;
Background Pitfalls or why I had to call the SAS helpline PROC SQL; CREATE TABLE gfsdb.isoest(dbtype= (Label= varchar(30) Estimate= numeric(12,5) StdErr= numeric(12,5) LowerCL= numeric(12,5) UpperCL= numeric(12,5) ChiSq= numeric(12,5) ProbChiSq= numeric(12,5) ) ) AS SELECT * FROM myestimates; QUIT;
Doing the same in R Background Using the RODBC package, the same can be accomplished in R. # R code require(rodbc) chan <- odbcconnect(dsn="gfsdb", uid = "postgres", case= postgresql,believenrows=false) qu <- paste("sql QUERY GOES HERE ") mydata <- sqlquery(chan, qu, errors= TRUE)
Summary RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Questions and comments