SAS Programming Efficiency

Similar documents
Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Peeking into Windows to Improve your SAS Performance

Rackspace Cloud Databases and Container-based Virtualization

PERFORMANCE TUNING ORACLE RAC ON LINUX

HiBench Introduction. Carson Wang Software & Services Group

SAS Data Set Encryption Options

A Survey of Shared File Systems

Practical Performance Understanding the Performance of Your Application

Oracle Database In-Memory The Next Big Thing

Agility Database Scalability Testing

Predefined Analyser Rules for MS SQL Server

Beyond SQL Essentials: The Need for Speed When Accessing SAS Data. Transcript

Running a Workflow on a PowerCenter Grid

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

HADOOP PERFORMANCE TUNING

Question: 3 When using Application Intelligence, Server Time may be defined as.

EZManage V4.0 Release Notes. Document revision 1.08 ( )

NT Event Log. CHAPTER 8 Enhancements for SAS Users under Windows NT

GraySort on Apache Spark by Databricks

McAfee Enterprise Mobility Management Performance and Scalability Guide

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Performance Testing. Configuration Parameters for Performance Testing

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Alternative Methods for Sorting Large Files without leaving a Big Disk Space Footprint

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Noelle A. Stimely Senior Performance Test Engineer. University of California, San Francisco

pc resource monitoring and performance advisor

Whitepaper: performance of SqlBulkCopy

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Using Different Partition Views to Split Up A Large Group Of System

The Advantages of Using RAID

Table of Contents. Chapter 1: Introduction. Chapter 2: Getting Started. Chapter 3: Standard Functionality. Chapter 4: Module Descriptions

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Network Traffic Monitoring & Analysis with GPUs

DMS Performance Tuning Guide for SQL Server

HPSA Agent Characterization

Resource Utilization of Middleware Components in Embedded Systems

Capacity Management for Oracle Database Machine Exadata v2

Performance Test Suite Results for SAS 9.1 Foundation on the IBM zseries Mainframe

1 How to Monitor Performance

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

Application Performance Testing Basics

Enterprise Manager Performance Tips

Merge Healthcare Virtualization

Contents. 2. cttctx Performance Test Utility Server Side Plug-In Index All Rights Reserved.

SAS System for Windows: Integrating with a Network Appliance Filer Mark Hayakawa, Technical Marketing Engineer, Network Appliance, Inc.

System performance monitoring in RTMT

Eloquence Training What s new in Eloquence B.08.00

MapReduce Evaluator: User Guide

Performance and Scalability Best Practices in ArcGIS

ArcGIS for Server Performance and Scalability: Testing Methodologies. Andrew Sakowicz, Frank Pizzi,

Top Ten SAS Performance Tuning Techniques

Maximizing SQL Server Virtualization Performance

Topics in Computer System Performance and Reliability: Storage Systems!

Optimizing Shared Resource Contention in HPC Clusters

SQL Server Business Intelligence on HP ProLiant DL785 Server

Network Infrastructure Services CS848 Project

Network Traffic Monitoring and Analysis with GPUs

Implementing Probes for J2EE Cluster Monitoring

W H I T E P A P E R : T E C H N I C A L. Understanding and Configuring Symantec Endpoint Protection Group Update Providers

Microsoft SQL Server OLTP Best Practice

Hands-On Microsoft Windows Server 2008

Systems Architecture. Paper ABSTRACT

CICS Transactions Measurement with no Pain

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

Load Testing Analysis Services Gerhard Brückl

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

Configuring Apache Derby for Performance and Durability Olav Sandstå

Tech Tip: Understanding Server Memory Counters

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Techniques for Managing Large Data Sets: Compression, Indexing and Summarization Lisa A. Horwitz, SAS Institute Inc., New York

Network performance monitoring. Performance Monitor Usage Guide

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Windows OS Agent Reference

VI Performance Monitoring

SCALABILITY AND AVAILABILITY

white paper Capacity and Scaling of Microsoft Terminal Server on the Unisys ES7000/600 Unisys Systems & Technology Modeling and Measurement

Energy Efficient MapReduce

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Configuration Maximums VMware vsphere 4.0

Summer Student Project Report

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

Achieving a High Performance OLTP Database using SQL Server and Dell PowerEdge R720 with Internal PCIe SSD Storage

Capacity Planning Process Estimating the load Initial configuration

EiS Kent Schools Broadband (KPSN) Network Performance Monitor Usage Guide March 2014

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

Gavin Payne Senior Consultant.

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

GPU-Based Network Traffic Monitoring & Analysis Tools

Getting Things Done: Practical Web/e-Commerce Application Stress Testing

Beam Loss Monitor Software Guide

Adaptive Server Enterprise

Jorix kernel: real-time scheduling

Web Load Stress Testing

Transcription:

SAS Programming Efficiency A Brief Introduction JC Wang Overview If you write/maintain production programs and work with large data sets, it is important to optimize your programs with respect to the technical environment and the resource constraints at you site: CPU time real time (or user time or elapsed time) memory data storage space I/O development time (or programmer time)

Threaded Processing with SAS Scalable Performance Data Engine (version 9.1+) Threading improves performance by enabling a SAS session to use multiple I/O channels and multiple CPUs. A thread is a single, independent flow of control through a program or within a process. SAS Scalable Performance Data (SPD) Engine supports threaded I/O through partitioned data sets. Threaded enabled SAS procedures include: SORT, SQL, MEANS, SUMMARY, REPORT, and TABULATE. Assessing Efficiency Needs Technical Environment Hadware: available memory, #CPUs, #/type of peripheral devices, communication hardware, network bandwidth, storage capacity, I/O bandwidth, capacity to upgrade Operating Environment: resource allocation, scheduling algorithms, I/O methods System Load: #users/jobs sharing system resources, network traffic, predicted increased load in the future SAS Environment: SAS products installed, #CPUs and memory allocated for SAS programming, methods running SAS

Assessing Efficiency Needs Programs Program size: focus on improving the efficiency of large program. Times running the program: focus on improving programs that are run many times. Assessing Efficiency Needs Data Data volume: focus on improving the efficiency of of programs using large data sets or many data sets. Data type.

Assessing Efficiency Needs Understanding efficiency trade-offs disk space might = CPU time I/O (reading/writing more data at one time) might = memory real time (enabling threading) might = CPU time any computing resource (increasing program complexity) might = programmer time Assessing Efficiency Needs Using SAS system options to track resources 1. OS: z/os (z) or Unix/Linux (U) or Windows (W) 2. Option set: at invoking (i) or by OPTIONS statement (O) 3. Is default (d) STIMER (z/i/d, UW/iO/d): CPU time/real time to be tracked/written to SAS log MEMRPT (z/io/d): memory usage statistics tracked FULLSTIMER (z/io, UW/iO): all available resource usage statistics tracked/written to SAS log; alias (in z)=fullstats and ignored if NOSTIMER and NOMEMRPT; some stats will not be accurate (in W) if this option is not (i) STATS (z/io/d): write statistics that are tracked by any combination of options above

Benchmarking Resource Usages Turn on resource usage report options before benchmarking. Execute code for each programming technique in a separate SAS session and include only essential parts for performing the task. Run code for each programming technique several times. Run tests under conditions in which the production program will run. Turn off resource usage report options after benchmarking tests. Measuring I/O SAS DATA step inputs from SAS data (in storage device) to Buffer (memory), I/O measured here; then creates PDV (within memory); then transfers to Buffer (memory); then outputs to SAS data, I/O measured here. inputs from raw data (in storage device) to Buffer (memory), I/O measured here; then copies to input buffer (memory); then creates PDV (within memory); then transfers to Buffer (memory); then outputs to SAS data, I/O measured here.

Controlling Page Size and Number of Buffers Page: unit of data transfer between storage device and memory; includes #bytes used by the descriptor portion, the data values, and any overhead; is fixed in size when data set created, either to a default value or to a user-specified value. Page size: amount of data that can be transferred to one buffer in a single I/O page size causes #times SAS read from/write to storage (at the cost of memory consumption ) page size can be reported by PROC CONTENTS Controlling Page Size and Number of Buffers continued Use BUFSIZE=MIN MAX n (in unit of bytes) option to control page size. Avoid MIN. Use BUFSIZE=0 to reset to default. Use BUFNO=MIN MAX n. MIN (i.e., n=0) is the default, causing SAS to use the (OS-dependent) minimal optimal value. MAX is the (OS-dependent) maximum possible number which can reach up to 2 31 1. Recommended maximum is 10. The product of above numbers determines how much data can be transferred in one I/O operation. These two numbers have minimal effect on CPU usage.

Controlling Page Size and Number of Buffers general recommendations For small data set, allocates as many buffers (BUFNO) as there are pages in the data set, especially when reading same observations several times during processing. Under z/os, as data set size, do: BUFNO, don t: BUFSIZE. SASFILE Statement To hold an existing SAS data file in memory so that it is available to multiple program steps (for input or updating processing, but not for utility or output processing) To reduce I/O and CPU time in a program that reads multiple time the data. SASFILE SAS-data<password-option(s)> OPEN LOAD CLOSE I/O is reduced if there is sufficient real memory; otherwise, SAS might use virtual memory (resulting in degrading the performance) or might use the default # of buffers. If processing a part of a large SAS data repeatedly, use a DATA step with the SASFILE statement to create a subset of the data set that fits into memory.

IBUFSIZE= System option for version 9+ 1. use IBUFSIZE=MIN MAX n system option to specify the page size of an index file, especially when there are many levels in the index index values are large in length 2. Avoid using MIN. 3. IBUFSIZE=0 resets to system default.