Different ways of calculating percentiles using SAS Arun Akkinapalli, ebay Inc, San Jose CA
|
|
- Erik Glenn
- 7 years ago
- Views:
Transcription
1 Different ways of calculating percentiles using SAS Arun Akkinapalli, ebay Inc, San Jose CA ABSTRACT Calculating percentiles (quartiles) is a very common practice used for data analysis. This can be accomplished using different methods in SAS with some variation in the output. This paper compares the various methods with their run times which in turn will give good insights for a programmer to choose the suitable option for their scenario. INTRODUCTION Percentile or Quartile is the value that represents a percentage position in a range of different values. The 25 th percentile is referred to as first quartile, 50 th percentile is the median and 75 th percentile is the third quartile. Based on where the data resides, the programmer can choose a method of calculating percentile. Percentiles can be calculated using any one of the following procedures: 1) PROC UNIVARIATE 2) PROC MEANS 3) PROC SUMAMRY 4) PROC REPORT The programmer can also take advantage of SAS In database with Teradata to get the approximate percentile values. We will evaluate also PROC FREQ and also explore a method to divide data and deriving percentiles with minimal transfer of data to SAS. The dataset used here for comparison resides in Teradata with around 80 million records and 100 columns. The SAS code included as part of this paper is functional on SAS v 9.3 in UNIX environment. Please note that output and run times may differ with respect to system environments and settings. METHODS TO CALCULATE PERCENTILES Proc Univariate / Proc Stdize These procedures provide comprehensive solution for calculating percentiles. PCTLPTS option can be used to specify the percentile value user is looking for. PCTLDEF specifies the definition this procedure uses to calculate percentiles. The default is 5. Advantage of these procedures is the flexibility to calculate value at any level, which is not the case with most of the other procedures. They don t support SAS In database with Teradata and hence the primary limitation is to have data stored locally in SAS. The syntax of PROC STDIZE is quite similar to PROC UNIVARIATE. Below is the syntax for the time taken to transfer and calculate 99.9-percentile value of dataset with 80 million records and 100 columns with default method of 5. proc univariate data=test.otl_chk; WHERE metric_1 > 0; CLASS DIM1 DIM2; VAR metric_1; ; output out=cap_val pctlpts = 99.9 pctlpre = pcap
2 42 minutes 27 minutes 69 minutes Proc Means / Proc Summary PROC MEANS / PROC Summary also support calculating percentile values. Statistical keyword has to be specified to get the percentile values such as P1, P10, P25, P50, P90, P99 and so on as per the requirement. QNTLDEF defines the method used to calculate the percentiles. The default value is 5. Advantage of this method is its support to In-database and hence the user doesn t have to transfer data from Teradata to SAS. The limitation is its restriction in calculating the percentile values at low levels except integers percentile value cannot be achieved in one step in this procedure. The user will have to first calculate the 99 th percentile value, subset the data and then apply 90 th percentile, making it inefficient while handling big data with minimal transfer. Below is the syntax to calculate the 99.9 percentile value using PROC MEANS with its run times. Proc summary syntax is similar to this. Step1: Calculate the 99 th percentile value on the input dataset options SQLGENERATION=DBMS MSGLEVEL=I sastrace=',,,d sastraceloc=saslog nostsuffix; libname indb teradata USER = xxxxxxx PASSWORD = "xxxxxxx" database = TEST_PRCT_W tdpid = "xxxxx"; proc means data=indb.xl_h_0704 noprint; class DIM_1 DIM_2; var METRIC_1; output out=test.cap_val P99=P99; Step2: Filer the Initial dataset with data greater then 99 th percentile value obtained from test.cap_val dataset above and create a different table in Teradata CONNECT TO TERADATA AS TD (USER=xxxxxxx PASSWORD= "xxxxxxx" DATABASE = TEST_PRCT_W logdb = xxxxxx fastexport=yes TDPID="xxxxx" mode = teradata); execute (INSERT INTO TEST_PRCT_W.XL_H_0705 SELECT * FROM TEST_PRCT_W.XL_H_0704 WHERE METRIC_1 >= 1000) by td; *1000 is derived from sas dataset in the above step; QUIT; Step 3: Calculate the 90 th percentile value on the new dataset to obtain final 99.9 th percentile value. proc means data=indb.xl_h_0705 noprint; class DIM_1 DIM_2; var METRIC_1; output out=test.final_cap P90=P90; 2
3 Transfer (TD - SAS) Percentile Capping (Step1 Step3) Total 0 minutes 19 minutes 19 minutes Proc Freq PROC FREQ procedure might be a good alternative solution to the above while handling big data. Using cumulative frequency option, the user can get similar result in a much efficient way. The idea is to get a cumulative frequency distribution on the initial dataset and filter for the specific value from the output. The advantage is its support to In database, which doesn t require any data transfer from Teradata to SAS and get the value at lower levels without multiple passes to the dataset. On the flip side, the value obtained here may not be as accurate as above two methods and there may be some issues trying to process large data (~10-11 billion records) in one go. Below is the syntax to calculate the 99.9 percentile value with its run times. Step1: Get the cumulative frequency distribution of the input dataset options SQLGENERATION=DBMS MSGLEVEL=I sastrace=',,,d' sastraceloc=saslog nostsuffix; libname indb teradata USER = xxxxxxx PASSWORD = "xxxxxx" database = TEST_PRCT_W tdpid = "xxxxxxx"; proc freq data = indb.xl_h_0704 noprint; where METRIC_1 > 0; tables METRIC_1 /out = TEST.poc_freq_1 outcum nofreq; by DIM_1 DIM_2; Step2: Filter for the minimum value at cumulative frequency on 99.9 create table pert_freq as select DIM_1, DIM_2, min (METRIC_1) from TEST.poc_freq_1 where cum_pct >= 99.9 group by DIM_1 DIM_2; 0 minutes 2 minutes 2 minutes Bucketing and subset of Data for Large Datasets: This method uses PROC SQL, data step & Teradata to calculate the percentile values. This can be alternative for cases where above approach (PROC FREQ) is not efficient due to large volumes of data. Limitation of this method is its multiple passes. Steps are outlined as below Divide the data into 20 different buckets based on a static lookup that define the starting and ending value of the each bucket. The number of buckets may vary based on data skewness. The bucket id is already populated in the source dataset in this case. 3
4 Calculate the overall count of the dataset and count of each bucket id. By dividing bucket id count to overall count in sorted order and doing a cumulative sum, user can determine the bucket id that consists of the 99.9 percentile value. CONNECT TO TERADATA AS TD (USER = xxxxxxxx PASSWORD = "xxxxxx" TDPID="xxxxxx" mode = teradata); create table BCKT_CNT as select * from connection to td (select bckt, count (*) as bckt_cnt from TEST.XL_H_0706 GROUP BY 1 order by 1); create table TOTAL_CNT as select * from connection to td (select count(*) as cnt from TEST.XL_H_0706); proc SQL; create table ttl as select a.bckt,(a.bckt_cnt/b.cnt)*100 as bckt_shr from bckt_cnt a, total_cnt b; data csum; set ttl; by bckt; retain total; total = sum(total,bckt_shr); Once the bucket id is determined, initial dataset can be filtered for data with only the specific bucket id. Dataset csum contains the required bucket id. The value associated with the bucket id here is We can filter the initial dataset from source data as follows: CONNECT TO TERADATA AS TD (USER = xxxxxx PASSWORD = "xxxxxxx" TDPID="xxxx" mode = teradata); execute(insert into TEST.XL_H_0707 select * from TEST.XL_H_0706 where METRIC_1 > 4000) by td; Get the cumulative frequency distribution of the subset as specified in above approach (PROC FREQ). Calculate the modified cumulative frequency that applies to the whole dataset as shown below. options SQLGENERATION=DBMS MSGLEVEL=I sastrace=',,,d' sastraceloc=saslog nostsuffix; libname indb teradata USER = XXXXXX PASSWORD = "xxxxxxxx" database = TEST tdpid = "XXXXXXX"; proc freq data = indb.exl_mn_0707 noprint; tables METRIC_1 /out = poc_freq_2 outcum nofreq; by DIM_1 DIM_2; create table mn_2 as select METRIC_1, (cum_pct * ( ))/100 as cum_freq_19th from poc_freq_2; Calculate the cumulative sum with the starting point ( ) of the bucket as the base value and choose the 99.9 th percentile value from the dataset. 4
5 data prctl; set mn_2; total = ; retain total; total = sum(total,cum_freq_19th); create table prctl_mn as select min (METRIC_1) as 99_9_prctl from prctl where total > 99.9; 0 minutes 2.5 minutes 2.5 minutes CONCLUSION Using PROC UNIVARIATE / STDIZE for smaller datasets (SAS) would be appropriate as it provides a comprehensive solution. As data volume increases and scenarios where data resides in a different database, choosing one of the In-database procedures will eliminate the data transfer and is an efficient way to calculate percentiles. PROC MEANS is good alternative for calculating percentiles with integer values (99,50,75,10,etc.). If the data volume is close to a billion records and to calculate percentiles at decimal level (99.9,75.8,0.01), PROC FREQ will serve as an effective method. For datasets greater than 1-2 billion, bucketing and subset may yield the results users are looking for. REFERENCES Patricia Guldin and Liping Zhang, Quartile Conundrum. Proceedings of the southeast SAS Users group (SESUG). PO-001. Available at SAS Institute (2012). SAS 9.2 Documentation, SAS 9.2 Procedures GUIDE ect008.htm SAS Institute (2012). SAS 9.2 Documentation, SAS 9.2 Procedures GUIDE ACKNOWLEDGEMENTS The author would like to thank ebay for allowing to use the necessary information CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Arunkumar Akkinapalli ebay Inc, 2525 North 1 st street, San Jose CA aakkinapalli@ebay.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 5
Top Ten SAS DBMS Performance Boosters for 2009 Howard Plemmons, SAS Institute Inc., Cary, NC
Paper 309-2009 Top Ten SAS DBMS Performance Boosters for 2009 Howard Plemmons, SAS Institute Inc, Cary, NC ABSTRACT Gleaned from internal development efforts and SAS technical support, this paper tracks
More informationCounting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
More informationManaging Tables in Microsoft SQL Server using SAS
Managing Tables in Microsoft SQL Server using SAS Jason Chen, Kaiser Permanente, San Diego, CA Jon Javines, Kaiser Permanente, San Diego, CA Alan L Schepps, M.S., Kaiser Permanente, San Diego, CA Yuexin
More informationSAS PASSTHRU to Microsoft SQL Server using ODBC Nina L. Werner, Madison, WI
Paper SA-03-2014 SAS PASSTHRU to Microsoft SQL Server using ODBC Nina L. Werner, Madison, WI ABSTRACT I wish I could live in SAS World I do much of my data analysis there. However, my current environment
More information4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
More informationProgramming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA.
Paper 23-27 Programming Tricks For Reducing Storage And Work Space Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA. ABSTRACT Have you ever had trouble getting a SAS job to complete, although
More informationEffective Use of SQL in SAS Programming
INTRODUCTION Effective Use of SQL in SAS Programming Yi Zhao Merck & Co. Inc., Upper Gwynedd, Pennsylvania Structured Query Language (SQL) is a data manipulation tool of which many SAS programmers are
More informationMWSUG 2011 - Paper S111
MWSUG 2011 - Paper S111 Dealing with Duplicates in Your Data Joshua M. Horstman, First Phase Consulting, Inc., Indianapolis IN Roger D. Muller, First Phase Consulting, Inc., Carmel IN Abstract As SAS programmers,
More information# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA
# or ## - how to reference SQL server temporary tables? Xiaoqiang Wang, CHERP, Pittsburgh, PA ABSTRACT This paper introduces the ways of creating temporary tables in SQL Server, also uses some examples
More informationPaper FF-014. Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL
Paper FF-014 Tips for Moving to SAS Enterprise Guide on Unix Patricia Hettinger, Consultant, Oak Brook, IL ABSTRACT Many companies are moving to SAS Enterprise Guide, often with just a Unix server. A surprising
More informationEXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
More informationLabels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY
Paper FF-007 Labels, Labels, and More Labels Stephanie R. Thompson, Rochester Institute of Technology, Rochester, NY ABSTRACT SAS datasets include labels as optional variable attributes in the descriptor
More informationSQL Pass-Through and the ODBC Interface
SQL Pass-Through and the ODBC Interface Jessica Hampton, CIGNA Corporation, Bloomfield, CT ABSTRACT Does SAS implicit SQL pass-through sometimes fail to meet your needs? Do you sometimes need to communicate
More informationTaming the PROC TRANSPOSE
Taming the PROC TRANSPOSE Matt Taylor, Carolina Analytical Consulting, LLC ABSTRACT The PROC TRANSPOSE is often misunderstood and seldom used. SAS users are unsure of the results it will give and curious
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More information9.1 SAS. SQL Query Window. User s Guide
SAS 9.1 SQL Query Window User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004. SAS 9.1 SQL Query Window User s Guide. Cary, NC: SAS Institute Inc. SAS
More informationUsing the Magical Keyword "INTO:" in PROC SQL
Using the Magical Keyword "INTO:" in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts, Boston, Massachusetts Abstract INTO: host-variable in PROC SQL is a powerful tool. It simplifies
More informationPharmaSUG 2015 - Paper QT26
PharmaSUG 2015 - Paper QT26 Keyboard Macros - The most magical tool you may have never heard of - You will never program the same again (It's that amazing!) Steven Black, Agility-Clinical Inc., Carlsbad,
More informationThe Query Builder: The Swiss Army Knife of SAS Enterprise Guide
Paper 1557-2014 The Query Builder: The Swiss Army Knife of SAS Enterprise Guide ABSTRACT Jennifer First-Kluge and Steven First, Systems Seminar Consultants, Inc. The SAS Enterprise Guide Query Builder
More informationParallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
More informationData exploration with Microsoft Excel: univariate analysis
Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating
More informationSalary. Cumulative Frequency
HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do
More informationChristianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC
Christianna S. Williams, University of North Carolina at Chapel Hill, Chapel Hill, NC ABSTRACT Have you used PROC MEANS or PROC SUMMARY and wished there was something intermediate between the NWAY option
More informationFrom Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL
From Database to your Desktop: How to almost completely automate reports in SAS, with the power of Proc SQL Kirtiraj Mohanty, Department of Mathematics and Statistics, San Diego State University, San Diego,
More informationInnovative Techniques and Tools to Detect Data Quality Problems
Paper DM05 Innovative Techniques and Tools to Detect Data Quality Problems Hong Qi and Allan Glaser Merck & Co., Inc., Upper Gwynnedd, PA ABSTRACT High quality data are essential for accurate and meaningful
More informationSAS Views The Best of Both Worlds
Paper 026-2010 SAS Views The Best of Both Worlds As seasoned SAS programmers, we have written and reviewed many SAS programs in our careers. What I have noticed is that more often than not, people who
More informationB) Mean Function: This function returns the arithmetic mean (average) and ignores the missing value. E.G: Var=MEAN (var1, var2, var3 varn);
SAS-INTERVIEW QUESTIONS 1. What SAS statements would you code to read an external raw data file to a DATA step? Ans: Infile and Input statements are used to read external raw data file to a Data Step.
More informationCrossing the environment chasm - better queries between SAS and DB2 Mark Sanderson, CIGNA Corporation, Bloomfield, CT
Crossing the environment chasm - better queries between SAS and DB2 Mark Sanderson, CIGNA Corporation, Bloomfield, CT ABSTRACT The source data used by many SAS programmers resides within non-sas relational
More informationApplications Development
Portfolio Backtesting: Using SAS to Generate Randomly Populated Portfolios for Investment Strategy Testing Xuan Liu, Mark Keintz Wharton Research Data Services Abstract One of the most regularly used SAS
More informationFoundations & Fundamentals. A PROC SQL Primer. Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC
A PROC SQL Primer Matt Taylor, Carolina Analytical Consulting, LLC, Charlotte, NC ABSTRACT Most SAS programmers utilize the power of the DATA step to manipulate their datasets. However, unless they pull
More informationGuido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
More informationSQL SUBQUERIES: Usage in Clinical Programming. Pavan Vemuri, PPD, Morrisville, NC
PharmaSUG 2013 Poster # P015 SQL SUBQUERIES: Usage in Clinical Programming Pavan Vemuri, PPD, Morrisville, NC ABSTRACT A feature of PROC SQL which provides flexibility to SAS users is that of a SUBQUERY.
More informationImproving Maintenance and Performance of SQL queries
PaperCC06 Improving Maintenance and Performance of SQL queries Bas van Bakel, OCS Consulting, Rosmalen, The Netherlands Rick Pagie, OCS Consulting, Rosmalen, The Netherlands ABSTRACT Almost all programmers
More informationAppendix III: SPSS Preliminary
Appendix III: SPSS Preliminary SPSS is a statistical software package that provides a number of tools needed for the analytical process planning, data collection, data access and management, analysis,
More informationTHE POWER OF PROC FORMAT
THE POWER OF PROC FORMAT Jonas V. Bilenas, Chase Manhattan Bank, New York, NY ABSTRACT The FORMAT procedure in SAS is a very powerful and productive tool. Yet many beginning programmers rarely make use
More informationData Cleaning 101. Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ. Variable Name. Valid Values. Type
Data Cleaning 101 Ronald Cody, Ed.D., Robert Wood Johnson Medical School, Piscataway, NJ INTRODUCTION One of the first and most important steps in any data processing task is to verify that your data values
More informationEvaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University
Running head: EVALUATING THE RESULTS OF A CAR CRASH STUDY USING SAS 1 Evaluating the results of a car crash study using Statistical Analysis System Kennesaw State University 2 Abstract Part 1. The study
More informationInstant Interactive SAS Log Window Analyzer
ABSTRACT Paper 10240-2016 Instant Interactive SAS Log Window Analyzer Palanisamy Mohan, ICON Clinical Research India Pvt Ltd Amarnath Vijayarangan, Emmes Services Pvt Ltd, India An interactive SAS environment
More informationImporting Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC
ABSTRACT PharmaSUG 2012 - Paper CC07 Importing Excel File using Microsoft Access in SAS Ajay Gupta, PPD Inc, Morrisville, NC In Pharmaceuticals/CRO industries, Excel files are widely use for data storage.
More informationPaper 2917. Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA
Paper 2917 Creating Variables: Traps and Pitfalls Olena Galligan, Clinops LLC, San Francisco, CA ABSTRACT Creation of variables is one of the most common SAS programming tasks. However, sometimes it produces
More informationHistogram of Numeric Data Distribution from the UNIVARIATE Procedure
Histogram of Numeric Data Distribution from the UNIVARIATE Procedure Chauthi Nguyen, GlaxoSmithKline, King of Prussia, PA ABSTRACT The UNIVARIATE procedure from the Base SAS Software has been widely used
More informationPaper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois
Paper 70-27 An Introduction to SAS PROC SQL Timothy J Harrington, Venturi Partners Consulting, Waukegan, Illinois Abstract This paper introduces SAS users with at least a basic understanding of SAS data
More informationData exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
More informationEncoding the Password
SESUG 2012 Paper CT-28 Encoding the Password A low maintenance way to secure your data access Leanne Tang, National Agriculture Statistics Services USDA, Washington DC ABSTRACT When users access data in
More informationDBF Chapter. Note to UNIX and OS/390 Users. Import/Export Facility CHAPTER 7
97 CHAPTER 7 DBF Chapter Note to UNIX and OS/390 Users 97 Import/Export Facility 97 Understanding DBF Essentials 98 DBF Files 98 DBF File Naming Conventions 99 DBF File Data Types 99 ACCESS Procedure Data
More informationSimulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC
ABSTRACT PharmaSUG 2015 - Paper QT33 Simulate PRELOADFMT Option in PROC FREQ Ajay Gupta, PPD, Morrisville, NC In Pharmaceuticals/CRO industries, table programing is often started when only partial data
More informationOh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing
Paper CC22 Oh No, a Zero Row: 5 Ways to Summarize Absolutely Nothing Stacey D. Phillips, i3 Statprobe, San Diego, CA Gary Klein, i3 Statprobe, San Diego, CA ABSTRACT SAS is wonderful at summarizing our
More informationAlternatives to Merging SAS Data Sets But Be Careful
lternatives to Merging SS Data Sets ut e Careful Michael J. Wieczkowski, IMS HELTH, Plymouth Meeting, P bstract The MERGE statement in the SS programming language is a very useful tool in combining or
More informationSurvey Analysis: Options for Missing Data
Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing
More informationCategorical Variables
Categorical Variables Variables which record a response as a set of categories are termed categorical. Such variables fall into three classifications: Nominal, Ordinal, and Interval. Nominal variables
More informationUsing SAS Views and SQL Views Lynn Palmer, State of California, Richmond, CA
Using SAS Views and SQL Views Lynn Palmer, State of Califnia, Richmond, CA ABSTRACT Views are a way of simplifying access to your ganization s database while maintaining security. With new and easier ways
More informationWe begin by defining a few user-supplied parameters, to make the code transferable between various projects.
PharmaSUG 2013 Paper CC31 A Quick Patient Profile: Combining External Data with EDC-generated Subject CRF Titania Dumas-Roberson, Grifols Therapeutics, Inc., Durham, NC Yang Han, Grifols Therapeutics,
More informationAn email macro: Exploring metadata EG and user credentials in Linux to automate email notifications Jason Baucom, Ateb Inc.
SESUG 2012 Paper CT-02 An email macro: Exploring metadata EG and user credentials in Linux to automate email notifications Jason Baucom, Ateb Inc., Raleigh, NC ABSTRACT Enterprise Guide (EG) provides useful
More informationPO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays
Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays, continued SESUG 2012 PO-18 Array, Hurray, Array; Consolidate or Expand Your Input Data Stream Using Arrays William E Benjamin
More informationGuide to Performance and Tuning: Query Performance and Sampled Selectivity
Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled
More informationThe HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL
Paper 88-216 The HPSUMMARY Procedure: An Old Friend s Younger (and Brawnier) Cousin Anh P. Kellermann, Jeffrey D. Kromrey University of South Florida, Tampa, FL ABSTRACT The HPSUMMARY procedure provides
More informationIntroduction to Criteria-based Deduplication of Records, continued SESUG 2012
SESUG 2012 Paper CT-11 An Introduction to Criteria-based Deduplication of Records Elizabeth Heath RTI International, RTP, NC Priya Suresh RTI International, RTP, NC ABSTRACT When survey respondents are
More informationReshaping & Combining Tables Unit of analysis Combining. Assignment 4. Assignment 4 continued PHPM 672/677 2/21/2016. Kum 1
Reshaping & Combining Tables Unit of analysis Combining Reshaping set: concatenate tables (stack rows) merge: link tables (attach columns) proc summary: consolidate rows proc transpose: reshape table Hye-Chung
More informationYou have got SASMAIL!
You have got SASMAIL! Rajbir Chadha, Cognizant Technology Solutions, Wilmington, DE ABSTRACT As SAS software programs become complex, processing times increase. Sitting in front of the computer, waiting
More informationAN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates
AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C Y Associates Abstract This tutorial will introduce the SQL (Structured Query Language) procedure through a series of simple examples We will initially
More informationSUGI 29 Data Warehousing, Management and Quality
Paper 106-29 Methods of Storing SAS Data into Oracle Tables Lois Levin, Independent Consultant, Bethesda, MD ABSTRACT There are several ways to create a DBMS table from a SAS dataset This paper will discuss
More informationCreating Dynamic Reports Using Data Exchange to Excel
Creating Dynamic Reports Using Data Exchange to Excel Liping Huang Visiting Nurse Service of New York ABSTRACT The ability to generate flexible reports in Excel is in great demand. This paper illustrates
More informationFoundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
More informationData-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC
Data-driven Validation Rules: Custom Data Validation Without Custom Programming Don Hopkins, Ursa Logic Corporation, Durham, NC ABSTRACT One of the most expensive and time-consuming aspects of data management
More informationUsing the COMPUTE Block in PROC REPORT Jack Hamilton, Kaiser Foundation Health Plan, Oakland, California
Using the COMPUTE Block in PROC REPORT Jack Hamilton, Kaiser Foundation Health Plan, Oakland, California ABSTRACT COMPUTE blocks add a great deal of power to PROC REPORT by allowing programmatic changes
More informationLet SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA
ABSTRACT PharmaSUG 2015 - Paper QT12 Let SAS Modify Your Excel File Nelson Lee, Genentech, South San Francisco, CA It is common to export SAS data to Excel by creating a new Excel file. However, there
More informationPerformance Test Suite Results for SAS 9.1 Foundation on the IBM zseries Mainframe
Performance Test Suite Results for SAS 9.1 Foundation on the IBM zseries Mainframe A SAS White Paper Table of Contents The SAS and IBM Relationship... 1 Introduction...1 Customer Jobs Test Suite... 1
More informationData Presentation. Paper 126-27. Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs
Paper 126-27 Using SAS Macros to Create Automated Excel Reports Containing Tables, Charts and Graphs Tugluke Abdurazak Abt Associates Inc. 1110 Vermont Avenue N.W. Suite 610 Washington D.C. 20005-3522
More information6 Steps to Faster Data Blending Using Your Data Warehouse
6 Steps to Faster Data Blending Using Your Data Warehouse Self-Service Data Blending and Analytics Dynamic market conditions require companies to be agile and decision making to be quick meaning the days
More information2 Describing, Exploring, and
2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain
More informationPROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC
Paper BB-12 PROC SUMMARY Options Beyond the Basics Susmita Pattnaik, PPD Inc, Morrisville, NC ABSTRACT PROC SUMMARY is used for summarizing the data across all observations and is familiar to most SAS
More informationIntroduction; Descriptive & Univariate Statistics
Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of
More informationQuick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7
Chapter 1 Introduction 1 SAS: The Complete Research Tool 1 Objectives 2 A Note About Syntax and Examples 2 Syntax 2 Examples 3 Organization 4 Chapter by Chapter 4 What This Book Is Not 5 Chapter 2 SAS
More informationChapter 6 INTERVALS Statement. Chapter Table of Contents
Chapter 6 INTERVALS Statement Chapter Table of Contents OVERVIEW...217 GETTING STARTED...218 ComputingStatisticalIntervals...218 ComputingOne-SidedLowerPredictionLimits...220 SYNTAX...222 SummaryofOptions...222
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationRelational Database: Additional Operations on Relations; SQL
Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationIBM Sterling Control Center
IBM Sterling Control Center System Administration Guide Version 5.3 This edition applies to the 5.3 Version of IBM Sterling Control Center and to all subsequent releases and modifications until otherwise
More information1 Files to download. 3 A macro to list out-of-range data values. 2 Reading in the example data file. 22S:172 Lab session 9 Macros for data cleaning
1 2 22S:172 Lab session 9 Macros for data cleaning July 20, 2005 GENDER VISIT HR SBP DBP DX AE = "Gender" = "Visit Date" = "Heart Rate" = "Systolic Blood Pressure" = "Diastolic Blood Pressure" = "Diagnosis
More informationa presentation by Kirk Paul Lafler SAS Consultant, Author, and Trainer E-mail: KirkLafler@cs.com
a presentation by Kirk Paul Lafler SAS Consultant, Author, and Trainer E-mail: KirkLafler@cs.com 1 Copyright Kirk Paul Lafler, 1992-2010. All rights reserved. SAS is the registered trademark of SAS Institute
More informationPaper TU_09. Proc SQL Tips and Techniques - How to get the most out of your queries
Paper TU_09 Proc SQL Tips and Techniques - How to get the most out of your queries Kevin McGowan, Constella Group, Durham, NC Brian Spruell, Constella Group, Durham, NC Abstract: Proc SQL is a powerful
More informationData management and SAS Programming Language EPID576D
Time: Location: Tuesday and Thursdays, 11:00am 12:15 pm Drachman Hall A319 Instructors: Angelika Gruessner, MS, PhD 626-3118 (office) Drachman Hall A224 acgruess@azcc.arizona.edu Office Hours: Monday Thursday
More informationA Recursive SAS Macro to Automate Importing Multiple Excel Worksheets into SAS Data Sets
PharmaSUG2011 - Paper CC10 A Recursive SAS Macro to Automate Importing Multiple Excel Worksheets into SAS Data Sets Wenyu Hu, Merck Sharp & Dohme Corp., Upper Gwynedd, PA Liping Zhang, Merck Sharp & Dohme
More informationAn Approach to Creating Archives That Minimizes Storage Requirements
Paper SC-008 An Approach to Creating Archives That Minimizes Storage Requirements Ruben Chiflikyan, RTI International, Research Triangle Park, NC Mila Chiflikyan, RTI International, Research Triangle Park,
More informationPost Processing Macro in Clinical Data Reporting Niraj J. Pandya
Post Processing Macro in Clinical Data Reporting Niraj J. Pandya ABSTRACT Post Processing is the last step of generating listings and analysis reports of clinical data reporting in pharmaceutical industry
More informationUsing Proc SQL and ODBC to Manage Data outside of SAS Jeff Magouirk, National Jewish Medical and Research Center, Denver, Colorado
Using Proc SQL and ODBC to Manage Data outside of SAS Jeff Magouirk, National Jewish Medical and Research Center, Denver, Colorado ABSTRACT The ability to use Proc SQL and ODBC to manage data outside of
More informationIntroduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI
Paper #HW02 Introduction to Proc SQL Steven First, Systems Seminar Consultants, Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps into
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationMake Better Decisions with Optimization
ABSTRACT Paper SAS1785-2015 Make Better Decisions with Optimization David R. Duling, SAS Institute Inc. Automated decision making systems are now found everywhere, from your bank to your government to
More informationIntegrating SAS with JMP to Build an Interactive Application
Paper JMP50 Integrating SAS with JMP to Build an Interactive Application ABSTRACT This presentation will demonstrate how to bring various JMP visuals into one platform to build an appealing, informative,
More informationEfficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA
Efficient Techniques and Tips in Handling Large Datasets Shilong Kuang, Kelley Blue Book Inc., Irvine, CA ABSTRACT When we work on millions of records, with hundreds of variables, it is crucial how we
More informationPharmaSUG AD08. Maximize the power of %SCAN using WORDSCAN utility Priya Saradha, Edison, NJ
PharmaSUG AD08 Maximize the power of %SCAN using WORDSCAN utility Priya Saradha, Edison, NJ This paper demonstrates the power of %SCAN macro function using the user-defined utility, WORDSCAN, and illustrates
More informationTechnical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication
Technical Paper Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication Release Information Content Version: 1.0 October 2015. Trademarks and Patents SAS Institute
More informationFinding National Best Bid and Best Offer
ABSTRACT Finding National Best Bid and Best Offer Mark Keintz, Wharton Research Data Services U.S. stock exchanges (currently there are 12) are tracked in real time via the Consolidated Trade System (CTS)
More informationFun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD
NESUG 2012 Fun with PROC SQL Darryl Putnam, CACI Inc., Stevensville MD ABSTRACT PROC SQL is a powerful yet still overlooked tool within our SAS arsenal. PROC SQL can create tables, sort and summarize data,
More informationMODUL 8 MATEMATIK SPM ENRICHMENT TOPIC : STATISTICS TIME : 2 HOURS
MODUL 8 MATEMATIK SPM ENRICHMENT TOPIC : STATISTICS TIME : 2 HOURS 1. The data in Diagram 1 shows the body masses, in kg, of 40 children in a kindergarten. 16 24 34 26 30 40 35 30 26 33 18 20 29 31 30
More informationSAS vs DB2 Functionality Who Does What Where?! Harry Droogendyk Stratia Consulting Inc.
SAS vs DB2 Functionality Who Does What Where?! Harry Droogendyk Stratia Consulting Inc. BIG Mountain of Data Move the Mountain? Move the Mountain! Our World Expensive Data Pulls create table visa_bal as
More informationPaper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation
Paper 109-25 Merges and Joins Timothy J Harrington, Trilogy Consulting Corporation Abstract This paper discusses methods of joining SAS data sets. The different methods and the reasons for choosing a particular
More informationCombining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC
ABSTRACT PharmaSUG 2013 - Paper CC11 Combining SAS LIBNAME and VBA Macro to Import Excel file in an Intriguing, Efficient way Ajay Gupta, PPD Inc, Morrisville, NC There are different methods such PROC
More information2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
More information