Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer, continued
|
|
|
- Blaze Morrison
- 10 years ago
- Views:
Transcription
1 PharmaSUG Paper CC16 Need for Speed in Large Datasets The Trio of SAS INDICES, PROC SQL and WHERE CLAUSE is the Answer ABSTRACT Kunal Agnihotri, PPD LLC, Morrisville, NC Programming on/with large datasets can often become a time consuming ordeal. One way to handle this type of situation is by using the powerful SAS INDEXES in conjunction with the WHERE clause in the PROC SQL step. This paper highlights how effective indexes can be created using SQL (more flexible when compared to indexes created using DATA step index option or the DATASETS procedure) and further subset data using the WHERE clause which drastically reduces dataset processing and run time. This combination o f techniques gives better results when accessing big datasets than using the said techniques singularly. This paper also throws light on the dual functionality of the SQL created indexes, namely, the ability to create indexes on both new and existing SAS d ata sets. INTRODUCTION A SAS index is file which has a direct link to the values of a particular variable or a set of variables in a data set. It is a snapshot of a part of the larger scheme of things. It is a copy of the original data but to a greater degree a copy of a specific part of the data set. The specificity of the copy is what is so useful in accessing large data sets. - this is the WHAT. The created index resides in the same library as the data set on which it was created. SAS stores index entries in separate index files. I m sure that whoever has had to deal with programming on large/massive/humongous data sets has at some point or the other wondered and/or wished of a way to expedite the painstakingly slow run times of the programs. Well, please stop wondering and join the SAS indexes bandwagon! - this is the WHY. An ideal and frequent candidate for using SAS indexes would be on large lab or ECG data or other such heavy-knit, long and wide data sets. - this is the WHO. Please be advised that SAS indexes do not need to be created for marginally big data sets. It is useful only when the data set at hand is genuinely big. It is of course at the discretion of the user whether to create a SAS index. Just because an index is created, that does not mean SAS will use it everytime the user invokes it. SAS judges if the use of the said index is a viable option or not and chooses the right to not put it into practice - which is a good thing. SAS does this to avoid using unnecessary/additional CPU resources. Remember that the index actually uses some memory and resources when it is created. So it should be created and used ONLY when the situation demands. The larger the data set, the larger is the index created. A point to be noted here is that the word large is a relative term. Some studies may consider a 100,000 KB data set to be a large dataset. That size is not big enough to justify the use of indexes. The user should bear in mind the index created itself will take up considerable space. This might hamper performance due to disk spa ce/resources constraints. An ideal large data set should be around or over the 2GB threshold to make indexes work without compromising system performance. VARIETY IS THE SPICE OF INDEXES A SAS index can be created either on one variable - simple index or on multiple variables - composite index. Each is explained with examples below. The variables used to create indexes are called index keys. Indexes can indeed be created using the INDEX DATA set option or the DATASETS procedure. The DATA set option only creates an index for a new data set. The DATASETS procedure on the other hand creates an index only for an existing data set. Both these functionalities are available with PROC SQL. Other than accessing the specific part of the large data set, SAS index also returns the observations in sorted order. This eliminates the need to sort the data set in subsequent data processing. This paper uses a LAB analysis data set as an example to create and use SAS indexes. The size of the data set is 2.11 GB and has over a million records in it.
2 Figure 1: Create a simple INDEX using PROC SQL on an existing data set Figure 1 demonstrates how to create a simple INDEX on an already existing SAS data set. The DATA step INDEX option does not have this functionality built in it. SAS picks the name of the index - it is the same name as the key variable on which is it created. As shown in the log for the above program, a simple index was defined on the variable PARAMCD. Let s move onto creating a simple index on a new data set. Figure 2: Create a simple INDEX using PROC SQL on a new data set Figure 2 shows the functionality of the SQL procedure to create an index in real time - i.e; when the data set is being created in the same step as the index. The DATA step index option is used here. The DATASETS procedure does not have this functionality built in it. 2
3 Don t be misled by the times shown in the log. The real and CPU time shown is the cumulative of both the data set and the index creation. Figure 3: Create a composite INDEX using PROC SQL Figure 3 shows how a composite index can be created on an existing data set. The user is responsible in naming the index. USUBJID, AVISITN and PARAMCD are the key variables used to create the SUB_VIS_PAR composite index. As seen in the NOTE in log for the program, a composite index has been defined on the ADLB data set. SPEED THRILLS One of the biggest time consuming parts of working with a large data set is the sorting of tho se data sets. Indexes eliminate the need to sort data, provided the index created is based on the variables on which the sort was intended to be. The user should be mindful of the fact that creating indexes also eats up system time and resources. Hence it is imperative that the user creates sensible indexes which the program can/needs to use often. Creating an index 3
4 which will only be used once is not a programming performance booster. The user will need to assess the creation of a particular index before actually creating it. The following examples show the comparison of efficiency in using indexes versus not using them. Figure 4: Processing analysis lab data using an INDEX In example 4, a new data set is being created using ADLB as source to subset on creatinine parameter test code CREAT. The source is the same data set used in example 3. The log has a message in the form of INFO which clearly states that an index was used to optimize the WHERE clause. Please note the time taken to create the subset. The INFO in the log can be activated by using the option MSGLEVEL = I. Now, a PROC step is used to create the same subset of data. This time, the source will be a data set which does not have an index defined to it but will be identical otherwise to the previous source data set in terms of data. The same variables are used in the BY statement which are used as key variables while the index was being created above. The where clause is identical in both cases. However, the time taken to create the subset is considerably mo re. While it does take some time to create an index, the time it took to create the subset with the good ol PROC SORT is far more than the combined time it took to create and index and create the subset. 4
5 Figure 5: Processing analysis lab data using PROC SORT A whopping difference of seconds (deduct about 4 seconds for index creation) real time seconds is saved with the use of the index. Although the time taken varies and depends on a case by case basis, the user should bear in mind the considerable saving he/she may have in the entire course of the program. USUBJID, AVISITN and PARAMCD variables were used in the above example as these variables are used in most of the processing of the LAB data set during its development/validation process. The user can create multiple simple or composite indexes, but should justify the creation by using them more than once in the program at hand. Hence the user will not need to separate the output data set created while using the index. The resulting data set will be sorted on the key variables used to define the index though there is no ORDER BY statement specified in the PROC SQL step above. Some users may wonder if using a TAGSORT would have competitive times if used in the above scenario. The following example shows the difference. Figure 6: Processing analysis lab data using the TAGSORT option in PROC SORT 5
6 The situation seems to have taken a turn for the worse. Except for the addition of the TAGSORT option, the remaining conditions remain identical. The TAGSORT did not seem to add any value to the sort. Instead it added time to the process, which is exactly the opposite to what is intended here. The TAGSORT option may reduce the temporary disk space used during processing but is not of great help in sorting times in this cas e. Depending on the size of the data and the BY variables used, the time TAGSORT takes may be less than a regular sort, but definitely can not beat the processing time set by the INDEXES. The following table re-iterates the exercise in the paper. Method Real Time (seconds) CPU time (seconds) PROC SORT TAGSORT option 1:32.65* INDEX * Minutes. PASS THROUGH COMING THROUGH Another use of the indexes can be leveraged while accessing the Oracle database using the PROC SQL pass through facility. Study level data stored in the Oracle database can be massive. To retrieve and process this data can be time and system resource consuming. Once the data is retrieved, it would need further processing depending on what needs to be done with the data. It is then when indexes can be put into practice and be useful in cutting down valuable time and system resources. CONCLUSION SAS indexes offer impressive time savings when used on large data set processing. They eliminate the need to sort the data repeatedly thereby increasing program efficiency. Care should be taken to not create indexes on trivial variables in the data set as this can slow down processing. The programmers are encouraged to analyze the use of a index creation as they can benefit from faster processing. Using an index becomes beneficial if the related subsets of data will be used repetitively in the course of the program. REFERENCES Micheal A. Raithel, The Complete Guide to SAS Indexes, SAS Press. 6
7 Kenneth W. Borowiak, Effectively Using the Indices in an Oracle Database with SAS ACKNOWLEDGEMENT The author would like to thank his mentor Ken Borowiak for his encouragement and Latonya Murphy, Thomas Souers and Hunter Everton for their insightful comments on this paper. DISCLAIMER The content of this paper are the works of the author and do not necessarily represent the opinions, recommendations, or practices of PPD, LLC. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Kunal Agnihotri 3900 Paramount Parkway Morrisville NC SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Oracle is a Registered Trademark of Oracle Corporation. Other brand and product names are trademarks of their respective companies. 7
SQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC
PharmaSUG 2013 Poster # P015 SQL SUBQUERIES: Usage in Clinical Programming Pavan Vemuri, PPD, Morrisville, NC ABSTRACT A feature of PROC SQL which provides flexibility to SAS users is that of a SUBQUERY.
Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC
ABSTRACT PharmaSUG 2015 - Paper QT33 Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC In Pharmaceuticals/CRO industries, table programing is often started when only partial data
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users are always interested in learning techniques related
SAS Views The Best of Both Worlds
Paper 026-2010 SAS Views The Best of Both Worlds As seasoned SAS programmers, we have written and reviewed many SAS programs in our careers. What I have noticed is that more often than not, people who
Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio
PharmaSUG 016 - Paper QT1 Enhancing the SAS Enhanced Editor with Toolbar Customizations Lynn Mullins, PPD, Cincinnati, Ohio ABSTRACT One of the most important tools for SAS programmers is the Display Manager
Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC
ABSTRACT PharmaSUG 2012 - Paper CC07 Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC In Pharmaceuticals/CRO industries, Excel files are widely use for data storage.
Table Lookups: From IF-THEN to Key-Indexing
Paper 158-26 Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint
Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint Rita Volya, Harvard Medical School, Boston, MA ABSTRACT Working with very large data is not only a question of efficiency
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
Improving Maintenance and Performance of SQL queries
PaperCC06 Improving Maintenance and Performance of SQL queries Bas van Bakel, OCS Consulting, Rosmalen, The Netherlands Rick Pagie, OCS Consulting, Rosmalen, The Netherlands ABSTRACT Almost all programmers
Subsetting Observations from Large SAS Data Sets
Subsetting Observations from Large SAS Data Sets Christopher J. Bost, MDRC, New York, NY ABSTRACT This paper reviews four techniques to subset observations from large SAS data sets: MERGE, PROC SQL, user-defined
How To Write A Clinical Trial In Sas
PharmaSUG2013 Paper AD11 Let SAS Set Up and Track Your Project Tom Santopoli, Octagon, now part of Accenture Wayne Zhong, Octagon, now part of Accenture ABSTRACT When managing the programming activities
Foundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC
A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in
A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC
A Closer Look at PROC SQL s FEEDBACK Option Kenneth W. Borowiak, PPD, Inc., Morrisville, NC SESUG 2012 ABSTRACT The FEEDBACK option on the PROC SQL statement controls whether an expanded or transformed
# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA
# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA ABSTRACT This paper introduces the ways of creating temporary tables in SQL Server, also uses some examples
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA ABSTRACT When we work on millions of records, with hundreds of variables, it is crucial how we
Managing Clinical Trials Data using SAS Software
Paper DM08 Managing Clinical Trials Data using SAS Software Martin J. Rosenberg, Ph.D., MAJARO InfoSystems, Inc. ABSTRACT For over five years, one of the largest clinical trials ever conducted (over 670,000
SAS Logic Coding Made Easy Revisit User-defined Function Songtao Jiang, Boston Scientific Corporation, Marlborough, MA
ABSTRACT PharmaSUG 2013 - Paper CC04 SAS Logic Coding Made Easy Revisit User-defined Function Songtao Jiang, Boston Scientific Corporation, Marlborough, MA SAS programmers deal with programming logics
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD
Using DATA Step MERGE and PROC SQL JOIN to Combine SAS Datasets Dalia C. Kahane, Westat, Rockville, MD ABSTRACT This paper demonstrates important features of combining datasets in SAS. The facility to
PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL
PharmaSUG 2015 - Paper QT06 PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL ABSTRACT Inspired by Christianna William s paper on
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load
Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator
WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion
A Survey of Shared File Systems
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.
Pharmasug 2014 - paper CC-47 It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks. ABSTRACT William E Benjamin Jr, Owl Computer
PharmaSUG2011 - Paper AD11
PharmaSUG2011 - Paper AD11 Let the system do the work! Automate your SAS code execution on UNIX and Windows platforms Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.,
Paper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries
Paper TU_09 Proc SQL Tips and Techniques - How to get the most out of your queries Kevin McGowan, Constella Group, Durham, NC Brian Spruell, Constella Group, Durham, NC Abstract: Proc SQL is a powerful
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya ABSTRACT Post Processing is the last step of generating listings and analysis reports of clinical data reporting in pharmaceutical industry
Normalizing SAS Datasets Using User Define Formats
Normalizing SAS Datasets Using User Define Formats David D. Chapman, US Census Bureau, Washington, DC ABSTRACT Normalization is a database concept used to eliminate redundant data, increase computational
Oracle Redo Log Performance Issues and Solutions
Oracle Redo Log Performance Issues and Solutions Sun Hongda Abstract The redo log plays a prime role in Oracle database s core functionality. However it imposes disk i/o for the redo log s inherent functionality,
Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA
Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA ABSTRACT Throughout the course of a clinical trial the Statistical Programming group is
UNIX Operating Environment
97 CHAPTER 14 UNIX Operating Environment Specifying File Attributes for UNIX 97 Determining the SAS Release Used to Create a Member 97 Creating a Transport File on Tape 98 Copying the Transport File from
SUGI 29 Hands-on Workshops
Paper 123-29 Creating and Exploiting SAS Indexes Michael A. Raithel, Westat, Rockville, MD Abstract SAS indexes can drastically improve the performance of programs that access small subsets of observations
Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA
CC13 Transferring vs. Transporting Between SAS Operating Environments Mimi Lou, Medical College of Georgia, Augusta, GA ABSTRACT Prior to SAS version 8, permanent SAS data sets cannot be moved directly
Figure 1. Example of an Excellent File Directory Structure for Storing SAS Code Which is Easy to Backup.
Paper RF-05-2014 File Management and Backup Considerations When Using SAS Enterprise Guide (EG) Software Roger Muller, Data To Events, Inc., Carmel, IN ABSTRACT SAS Enterprise Guide provides a state-of-the-art
EMC CLARiiON Backup Storage Solutions
Engineering White Paper Backup-to-Disk: An Overview Abstract This white paper is an overview of disk-based backup methodologies. It compares disk and tape backup topologies and describes important considerations
Statistics and Analysis. Quality Control: How to Analyze and Verify Financial Data
Abstract Quality Control: How to Analyze and Verify Financial Data Michelle Duan, Wharton Research Data Services, Philadelphia, PA As SAS programmers dealing with massive financial data from a variety
SAS Data Set Encryption Options
Technical Paper SAS Data Set Encryption Options SAS product interaction with encrypted data storage Table of Contents Introduction: What Is Encryption?... 1 Test Configuration... 1 Data... 1 Code... 2
Product Lifecycle Management in the Medical Device Industry. An Oracle White Paper Updated January 2008
Product Lifecycle Management in the Medical Device Industry An Oracle White Paper Updated January 2008 Product Lifecycle Management in the Medical Device Industry PLM technology ensures FDA compliance
Spelling Checker Utility in SAS using VBA Macro and SAS Functions Ajay Gupta, PPD, Morrisville, NC
PharmaSUG 2015 - Paper P017 Spelling Checker Utility in SAS using VBA Macro and SAS Functions Ajay Gupta, PPD, Morrisville, NC ABSTRACT In Pharmaceuticals/CRO industries, it is quite common to have typographical
Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
TU04. Best practices for implementing a BI strategy with SAS Mike Vanderlinden, COMSYS IT Partners, Portage, MI
TU04 Best practices for implementing a BI strategy with SAS Mike Vanderlinden, COMSYS IT Partners, Portage, MI ABSTRACT Implementing a Business Intelligence strategy can be a daunting and challenging task.
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Effective Use of SQL in SAS Programming
INTRODUCTION Effective Use of SQL in SAS Programming Yi Zhao Merck & Co. Inc., Upper Gwynedd, Pennsylvania Structured Query Language (SQL) is a data manipulation tool of which many SAS programmers are
Overview. NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT
177 CHAPTER 8 Enhancements for SAS Users under Windows NT Overview 177 NT Event Log 177 Sending Messages to the NT Event Log Using a User-Written Function 178 Examples of Using the User-Written Function
Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access
Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix Jennifer Clegg, SAS Institute Inc., Cary, NC Eric Hill, SAS Institute Inc., Cary, NC ABSTRACT Release 2.1 of SAS
Top Ten SAS DBMS Performance Boosters for 2009 Howard Plemmons, SAS Institute Inc., Cary, NC
Paper 309-2009 Top Ten SAS DBMS Performance Boosters for 2009 Howard Plemmons, SAS Institute Inc, Cary, NC ABSTRACT Gleaned from internal development efforts and SAS technical support, this paper tracks
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
PharmaSUG2010 HW06. Insights into ADaM. Matthew Becker, PharmaNet, Cary, NC, United States
PharmaSUG2010 HW06 Insights into ADaM Matthew Becker, PharmaNet, Cary, NC, United States ABSTRACT ADaM (Analysis Dataset Model) is meant to describe the data attributes such as structure, content, and
Introduction to Criteria-based Deduplication of Records, continued SESUG 2012
SESUG 2012 Paper CT-11 An Introduction to Criteria-based Deduplication of Records Elizabeth Heath RTI International, RTP, NC Priya Suresh RTI International, RTP, NC ABSTRACT When survey respondents are
Load Testing and Monitoring Web Applications in a Windows Environment
OpenDemand Systems, Inc. Load Testing and Monitoring Web Applications in a Windows Environment Introduction An often overlooked step in the development and deployment of Web applications on the Windows
Integrity Constraints and Audit Trails Working Together Gary Franklin, SAS Institute Inc., Austin, TX Art Jensen, SAS Institute Inc.
Paper 8-25 Integrity Constraints and Audit Trails Working Together Gary Franklin, SAS Institute Inc., Austin, TX Art Jensen, SAS Institute Inc., Englewood, CO ABSTRACT New features in Version 7 and Version
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Downloading, Configuring, and Using the Free SAS University Edition Software
PharmaSUG 2015 Paper CP08 Downloading, Configuring, and Using the Free SAS University Edition Software Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Charles Edwin Shipp,
Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI
Paper BtB-16 Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI SESUG 2013 ABSTRACT When dealing with data from multiple or unstructured
Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA
WUSS2015 Paper 84 Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA ABSTRACT Creating your own SAS application to perform CDISC
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
ADVANTAGES OF IMPLEMENTING A DATA WAREHOUSE DURING AN ERP UPGRADE
ADVANTAGES OF IMPLEMENTING A DATA WAREHOUSE DURING AN ERP UPGRADE Advantages of Implementing a Data Warehouse During an ERP Upgrade Upgrading an ERP system presents a number of challenges to many organizations.
DataStax Enterprise, powered by Apache Cassandra (TM)
PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies
Backup and Restore Back to Basics with SQL LiteSpeed
Backup and Restore Back to Basics with SQL December 10, 2002 Written by: Greg Robidoux Edgewood Solutions www.edgewoodsolutions.com 888.788.2444 2 Introduction One of the most important aspects for a database
Managing Orion Performance
Managing Orion Performance Orion Component Overview... 1 Managing Orion Component Performance... 3 SQL Performance - Measuring and Monitoring a Production Server... 3 Determining SQL Server Performance
ORACLE ENTERPRISE MANAGER 10 g CONFIGURATION MANAGEMENT PACK FOR ORACLE DATABASE
ORACLE ENTERPRISE MANAGER 10 g CONFIGURATION MANAGEMENT PACK FOR ORACLE DATABASE CONFIGURATION MANAGEMENT PACK FEATURES Automated discovery of dependency relationships between services, systems and Oracle
Intelligent Query and Reporting against DB2. Jens Dahl Mikkelsen SAS Institute A/S
Intelligent Query and Reporting against DB2 Jens Dahl Mikkelsen SAS Institute A/S DB2 Reporting Pains Difficult and slow to get information on available tables and columns table and column contents/definitions
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases
3 CHAPTER 1 Overview of SAS/ACCESS Interface to Relational Databases About This Document 3 Methods for Accessing Relational Database Data 4 Selecting a SAS/ACCESS Method 4 Methods for Accessing DBMS Tables
Performance Report Modular RAID for PRIMERGY
Performance Report Modular RAID for PRIMERGY Version 1.1 March 2008 Pages 15 Abstract This technical documentation is designed for persons, who deal with the selection of RAID technologies and RAID controllers
Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark
Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark In-memory database systems (IMDSs) eliminate much of the performance latency associated with traditional on-disk
KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG
Paper BB-07-2014 Using Arrays to Quickly Perform Fuzzy Merge Look-ups: Case Studies in Efficiency Arthur L. Carpenter California Occidental Consultants, Anchorage, AK ABSTRACT Merging two data sets when
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data
The Essentials of Finding the Distinct, Unique, and Duplicate Values in Your Data Carter Sevick MS, DoD Center for Deployment Health Research, San Diego, CA ABSTRACT Whether by design or by error there
Fact Sheet In-Memory Analysis
Fact Sheet In-Memory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the In-Memory Database...4
The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.
Performance benefit of MAX5 for databases The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5 Vinay Kulkarni Kent Swalin IBM
DATABASE VIRTUALIZATION AND INSTANT CLONING WHITE PAPER
DATABASE VIRTUALIZATION AND INSTANT CLONING TABLE OF CONTENTS Brief...3 Introduction...3 Solutions...4 Technologies....5 Database Virtualization...7 Database Virtualization Examples...9 Summary....9 Appendix...
WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC ABSTRACT One of the most expensive and time-consuming aspects of data management
SAS Programming Tips, Tricks, and Techniques
SAS Programming Tips, Tricks, and Techniques A presentation by Kirk Paul Lafler Copyright 2001-2012 by Kirk Paul Lafler, Software Intelligence Corporation All rights reserved. SAS is the registered trademark
The IBM Cognos Platform for Enterprise Business Intelligence
The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics
Evaluation Guide. Software vs. Appliance Deduplication
Evaluation Guide Software vs. Appliance Deduplication Table of Contents Introduction... 2 Data Deduplication Overview... 3 Backup Requirements... 6 Backup Application Client Side Deduplication... 7 Backup
Virtual Tape Systems for IBM Mainframes A comparative analysis
Virtual Tape Systems for IBM Mainframes A comparative analysis Virtual Tape concepts for IBM Mainframes Mainframe Virtual Tape is typically defined as magnetic tape file images stored on disk. In reality
Using SAS as a Relational Database
Using SAS as a Relational Database Yves DeGuire Statistics Canada Come out of the desert of ignorance to the OASUS of knowledge Introduction Overview of relational database concepts Why using SAS as a
Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
Make Better Decisions with Optimization
ABSTRACT Paper SAS1785-2015 Make Better Decisions with Optimization David R. Duling, SAS Institute Inc. Automated decision making systems are now found everywhere, from your bank to your government to
MSU Tier 3 Usage and Troubleshooting. James Koll
MSU Tier 3 Usage and Troubleshooting James Koll Overview Dedicated computing for MSU ATLAS members Flexible user environment ~500 job slots of various configurations ~150 TB disk space 2 Condor commands
The Methodology Behind the Dell SQL Server Advisor Tool
The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity
