Data Storage Options for SAS Applications: SAS Intelligent Storage Bill Gibson Chief Technology Officer SAS Asia Pacific 1
Presentation Aim! Outline Architecture of SAS Intelligent Storage.! Highlight some existing features that you may not know about.! Compare & contrast SAS storage with traditional RDBMS.! Outline Future Directions for SAS Storage 2
Agenda: SAS Intelligent Storage! Overview! SAS Tables Features! SAS Storage Architecture! Data Servers! Open Access to SAS Data! OLAP Storage! RDBMS Storage! Metadata outside scope! Conclusion 3
Intelligent Storage Data Storage is about reactive capture. Intelligence Storage is about proactive exploitation Transaction systems Web Logs Demografic data Highlight KPIs Compare Handheld Devices Customer Focused Data warehouse Surveys Trend Customer focused data warehouse Forecast Call Center data Campaign data Market analysis Identify Manage Campaigns Predict 4
Intelligent Storage Types based on Business Needs Business Need Sorting, Ranking Compare/Contrast Trends, Pattern, Predict Historical Realtime Mix Structured/Unstructured Storage Type Relational HOLAP Parallel Nearline / Offline Standards Support Metadata/Portal 5
! Proactive Exploitation SAS Intelligence Storage Design Aims! Scalable for data AND intelligence requirements! Usability! Manageability! Interoperability! Optimised for Warehousing & Decision Support. Bulk, single user loading, in scheduled windows. Read only queries processing large amounts of data focus on Intelligence applications not OLTP 6
Intelligence Storage Considerations Size Tera Relational Parallel SASs Multidimensional Giga SPD Server HOLAP Size Mega SAS Tables MDDB 7 Detail Summary
SAS Relational Storage: Overview! Relational: Based around tables! Accessed via SQL & SAS 4GL! SAS offers much richer data manipulation than pure SQL! Set based and procedural logic! Multiple physical storage formats Via Multiple Engine Architecture 8
SAS Multiple Engine Architecture 9! Decouples Applications from physical storage.! Transparent access to many physical data stores.! View Engines for virtual tables! SQL View! Data Step View! Foreign Engines for! Other database tables! XML! ERP! Flat files Engines Base Applications Supervisor SAS Remote Tape... SASSPDS Host Data Data
10 Base Engine Full functionality of the SAS table! supports all the functionality required by SAS statements and procedures.! creates, maintains, and uses indexes.! supports compressed observations.! enforces integrity constraints! creates audit trails! Cross environment data access (CEDA) without remote server.! Open via ODBC! Special features Generations of tables In-memory tables Local File Server
Physical Limits on SAS Tables (V8)! max columns: 32,767! max row length:16,777,191! max rows: virtually unlimited, opsys dependent.! max table size limited by opsys:! 4Giga-gigabytes on NT with NTFS, 4TB VMS, 11
When is Data too big for a Standard SAS Table?! No hard rules it depends on environment & application.! Answer: when time to process it doesn t meet user needs!! Note it is table size, not database size that is the issue. Many SAS databases in the Terabyte range.! A standard SAS table is a single physical file Speed limited by I/O subsystem. Rule of Thumb: Over 10G per table is starts to get big processing time 10-30 min with 10G on small hardware. 12
V9 New Features! Scalable partitioned data format (SPDE) available! 2 G columns! 2 63 rows! Adequate for most needs! Support for threaded procedures! Pipe engine for scalable cooperative processing 13
! Tables The SAS Storage Model! Columns and Rows! Embedded metadata in Table Header! Libraries! Collections of tables Typically in a single directory (except OS/390) 14
SAS Term Data set Observation Variable Library? SAS Storage Model: Terminology RDBMS Term Table Row Column?Logical schema or table space? Database 15 Missing Null
General Purpose RDBMS Architecture Users/ Applications Database Server Database Files DBMS Opsys! All Users share single DBMS Server! DBMS Server manages All physical data access Concurrency Rollback/recovery Constraints Security.! Only DBMS Server understands proprietary Database file structure. 16
SAS Multi-Engine Architecture SAS Application Engine Supervisor Engine 1 Engine 2 Operating System! Each User has own copy of SAS! Each Library has Engine assigned depending on library type! Engine understands table structure for that library.! Operating System sees each table as an individual file: controls! access to tables! physical data management & security! backup & recovery 17
SAS Application Engine Supervisor Engine 1 Engine 2 SAS Multi-Engine Architecture Multiple Users SAS Application Engine Supervisor Engine 1 Engine 2! Operating System manages:! Locking at table level! Multi-user Buffering Operating System 18
Multi-Engine Architecture (Base SAS Engine) Advantages! Simplicity! Leverages OS features Openness Buffering Security Backup/Archiving! Multiple engines /storage formats Disadvantages! OS cannot see inside tables! No row level security Backup/rollback! Less portable?! Table = Single Physical File! Less scalable? 19
SAS Internal Table Organisation! Fixed length storage for numeric & character columns " fixed length rows 1! No row header info! Rows stored sequentially " Any Column can be accessed by offset from Row start " Any Row can be accessed by offset from Table Start, in any order. Metadata 20 Note 1: Provided default features specified: (Reuse=No Compress=No)
! Very short cpu paths SAS Internal Table Organisation: Benefit! Efficient I/O for accessing subsets by row number! Very rapid sequential access! Rapid access to any column in wide tables making Wide Tables manageable! Many databases have small limits on column numbers (Oracle 8: 1000 columns)! Wide tables are needed for tasks such as data mining. 21
DataBase Internals- Oracle (as an example) 22
Oracle Structure "Everything should be made as simple as possible, but not simpler. Einstein 23
Why Do SAS Tables Perform So Well?! Many documented references (SEUGI papers ) where SAS processing especially loading & sequential processing for summarisation, subsetting greatly outperform other DBMSs.! Understanding the internals helps explain why.! Simple structure designed specifically for intelligence! 25 years of optimisation 24
SAS Database Servers! Usually the Engines in Base SAS provide all the database services that are needed.! Network File services & CEDA provide some client/server functions! Specialist Servers are available if required! SAS/Share! SPD Server 25
SAS Database Servers : SAS/Share SAS Share Server SAS Application 1 Engine Supervisor Engine 1 Engine 2 Operating System Engine Supervisor Share Engine SAS Application 2 Engine Supervisor Share Engine 26
SAS/Share! Provides multi-user row level locking for SAS tables! Supports Multi-Engine Architecure! Can serve up data from SAS tables and other data sources! Can process cross source SQL joins etc proc sql; connect to remote(server=vegemite.shr9); select * from connection to remote ( --join sql statement------); 27
SAS Database Servers Summary: SAS/Share! Lightweight OLTP server, row level concurrent update! Control files, metadata! Low volume SQL query server! Standard SAS tables, other engines! Hybrid joins! Supports ODBC clients (V8) Don t Use Share if you can use a network file system for read-only access by SAS clients. 28
SPD Server Gigabyte-Terabyte per table Storage! SMP parallel processing! Advanced Indexing! Read / Write / Alter / Control permissions! Universal / Group / Individual access rights! Row and Column level security! Login and Data Encryption 29
SAS Database Servers : SPD Server SPDS Thread 1 SPD Server SPD Proxy Process Thread n Operating System Each Table is partitioned TCP Parallel Processing Higher Throughput SAS Application 1 Engine Supervisor SPDS Engine SAS Application 2 Engine Supervisor SPDS Engine 30
Case Study: Telecom Italia Mobile TIM's huge customer base meant that performance was a critical issue in the company's choice of an analytical CRM vendor. TIM processes about 100 million call records per day and has built a 3-terabyte SAS data warehouse that is accessed by SAS analytical CRM software. "SAS met our criteria for scalability and performance," says Cardone. 31
V9 SPDE Engine! Single User parallel partitioned engine.! Derived from SPDS! Does not include multiple user, management & administration features of SPDS.! File format compatible with SPDS! Great match for new scaleable procedures.! Needs multiple CPUs & filesystems to perform V9 Base SAS Applications Supervisor Tape Remote Base... SPDE Host Data Data Data Data 32
SAS Database Servers Summary: SPD Server/SPDE! Heavyweight Warehouse Engine! Very large table support (1-100+G)! Requires multiple CPUs, filesystems to perform. See other SEUGI papers for more info 33
Open Access to SAS Data 35 Tables 1. Universal ODBC Driver (no SAS required) 2. ODBC, JDBC & OLE DB & Local SAS Data Provider 3. ODBC, JDBC & OLE DB & SAS Server Servers: SHARE, SPD, IOM 4. Exporting 1. SAS XML Engine 2. SAS/Access Engines for DBMSs & PC File formats MDDB SAS Open OLAP Server - access to any client that supports the OLE DB for OLAP standard.
Open Data Access Summary! Universal ODBC - great for reading single SAS tables sequentially. No SQL, no Indexed Where. Data Source setup required per Table. Remote data accessed via FTP/HTTP.! Local/Share & ODBC - Great for accessing collections of tables. SQL & Indexed Where supported. Remote data transparently accessed. Any SAS library accessible, including DBMS libraries. Note: ODBC or OLE DB available for SPD Server data also. 37
Storing Data in Non-SAS Formats! SAS Multiple Engine Architecture in V8 allows Libname based read/write/update to most Relational databases.! SAS/Access Licence(s) needed.! SAS/Access provides! Transparent reading from relational data stores! SQL queries to relational data stores! Sophisticated & transparent writing/update 39
Its as Simple as a Libname 213 Libname ora oracle user=scott pass=xxxxx path='sas1'; NOTE: Libref ORA was successfully assigned as follows: Engine: ORACLE Physical Name: sas1 40
SAS/Access Features! Many query optimisations! Lots of V8-8.2 enhancements! 4GL Data step interfaces (Keyed access using Indexes)! Bulkload Interfaces for high volume DW loading Not all DBMSs! V9: parallel access! 41
Summary Storage: SAS MDDBs Mega-Gigabyte Summary Storage! Scalable! Flexible security! Base of HOLAP 42
Summary Storage: Hybrid OLAP 100s MB - 100s GB Server Client SAS! data stored in multiple MDDBs! SAS tables MDDB DB2 MDDB HOLAP Data Provider! SPD tables! RDBMS tables SQL Server SAS! See other SEUGI papers 43
Conclusion So, what is the SAS Database? 44
Sample SAS Application Storage Architecture! Design the Storage to meet the application needs! Source data read from operational applications using SAS /Access! Application metadata (control tables) in SAS/Share controlled tables! Working storage in SAS tables on application server! Main detail store in SPDS tables! Reporting data in OLAP MDDBs.! Analysts personal data in SAS tables on personal laptops, for offline access 45
Putting it Together Data from Databases Data from Remote SAS Server Local Data 46
Putting it Together: Libnames the key 47
What is the SAS Database? Intelligent Storage for SAS Applications!! A Distributed Virtual Database! Defined by the LIBNAMES in effect! Central Servers optional! reducing bottlenecks! Optimised for Intelligence Applications! Open via! ODBC/JDBC, OLE DB! SQL library! XML interchange 48
Summary! SAS Intelligent Storage is designed specifically for Intelligence Applications.! It provides an efficient and cost effective solution, with low overhead.! It is being continually improved & refined. 49
Questions? 50