CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT



Similar documents
SAS CLINICAL TRAINING

Histogram of Numeric Data Distribution from the UNIVARIATE Procedure

Scatter Plots with Error Bars

Graphing in SAS Software

Paper KEYWORDS PROC TRANSPOSE, PROC CORR, PROC MEANS, PROC GPLOT, Macro Language, Mean, Standard Deviation, Vertical Reference.

The Basics of Creating Graphs with SAS/GRAPH Software Jeff Cartier, SAS Institute Inc., Cary, NC

Box-and-Whisker Plots with The SAS System David Shannon, Amadeus Software Limited

Let SAS Write Your SAS/GRAPH Programs for You Max Cherny, GlaxoSmithKline, Collegeville, PA

EXST SAS Lab Lab #4: Data input and dataset modifications

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Using SAS to Create Graphs with Pop-up Functions Shiqun (Stan) Li, Minimax Information Services, NJ Wei Zhou, Lilly USA LLC, IN

LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

OVERVIEW OF THE ENTERPRISE GUIDE INTERFACE

PharmaSUG Paper DV05

Dealing with Data in Excel 2010

Diagrams and Graphs of Statistical Data

Describing, Exploring, and Comparing Data

Dongfeng Li. Autumn 2010

Multiple Graphs on One Page (Step-by-step approach) Yogesh Pande, Schering-Plough Corporation, Summit NJ

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

Data Exploration Data Visualization

Visualizing Key Performance Indicators using the GKPI Procedure Brian Varney, COMSYS, a Manpower Company, Portage, MI

SAS ODS. Greg Jenkins

Using SAS/GRAPH Software to Create Graphs on the Web Himesh Patel, SAS Institute Inc., Cary, NC Revised by David Caira, SAS Institute Inc.

5 Correlation and Data Exploration

Gamma Distribution Fitting

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Abstract. Introduction. System Requirement. GUI Design. Paper AD

Exercise 1.12 (Pg )

Using SPSS, Chapter 2: Descriptive Statistics

MTH 140 Statistics Videos

CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08

Innovative Techniques and Tools to Detect Data Quality Problems

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

How Does My TI-84 Do That

Descriptive Statistics

Descriptive Statistics

Data exploration with Microsoft Excel: analysing more than one variable

Chapter 5 Analysis of variance SPSS Analysis of variance

Tips and Tricks: Using SAS/GRAPH Effectively A. Darrell Massengill, SAS Institute, Cary, NC

Scientific Graphing in Excel 2010

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

4 Other useful features on the course web page. 5 Accessing SAS

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Guidebook to R Graphics Using Microsoft Windows

Summarizing and Displaying Categorical Data

Scatterplots: Basics, enhancements, problems and solutions Peter L. Flom, Peter Flom Consulting, New York, NY

SUGI 29 Posters. Web Server

ABSTRACT INTRODUCTION

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Using PROC SGPLOT for Quick High Quality Graphs

The KaleidaGraph Guide to Curve Fitting

Getting Correct Results from PROC REG

SAS Mapping: Technologies, Techniques, Tips and Tricks Darrell Massengill

SPSS Tests for Versions 9 to 13

Paper Airplanes & Scientific Methods

SUMAN DUVVURU STAT 567 PROJECT REPORT

Microsoft Excel. Qi Wei

Risk Management : Using SAS to Model Portfolio Drawdown, Recovery, and Value at Risk Haftan Eckholdt, DayTrends, Brooklyn, New York

AP STATISTICS REVIEW (YMS Chapters 1-8)

C:\Word\documentation\SAS\Class2\SASLevel2.doc 3/7/2013 Biostatistics Consulting Center

MEASURES OF LOCATION AND SPREAD

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

Geostatistics Exploratory Analysis

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

Data exploration with Microsoft Excel: univariate analysis

Introduction to Exploratory Data Analysis

Data Visualization with SAS/Graph

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

How To Test For Significance On A Data Set

Common Tools for Displaying and Communicating Data for Process Improvement

Simple linear regression

Basic Understandings

Laboratory 3 Type I, II Error, Sample Size, Statistical Power

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC


Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

3: Summary Statistics

Exploratory data analysis (Chapter 2) Fall 2011

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

How To Write A Data Analysis

Basic Statistical and Modeling Procedures Using SAS

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

SAS/GRAPH 9.2 ODS Graphics Editor. User s Guide

And Now, Presenting...

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Transcription:

1 CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT Sheng Zhang, Xingshu Zhu, Shuping Zhang, Weifeng Xu, Jane Liao, and Amy Gillespie Merck and Co. Inc, Upper Gwynedd, PA Abstract PROC GPLOT is a widely used procedure in SAS /GRAPH to produce scatter, line, box, and Kaplan-Meier plots. In this paper we present step by step procedures to generate several common graphics. We highlight the use of the INTERPOL symbol option. Combining various options with PROC GPLOT, we explore the flexibility and broad functionality of this SAS/GRAPH procedure to produce graphics simply and quickly. Introduction SAS graph is often used to visually represent the relationship among data values. There are many useful procedures to accomplish these, including PROC GPLOT, PROC GCHART, or PROC BOXPLOT. Among those, PROC GPLOT is a popular and important procedure for generating quality graphs. It allows users to enhance the graphic appearance by controlling the axes and the positions of many elements in the graph panel. This paper starts by presenting a simple SAS code for creating basic scatter plots. The code is then extended for a more enhanced variety of SAS plots, including line, box, mean plot with bars, and Kaplan-Meier plots, following step by step procedures. Dataset Data being used for illustration purposes in this paper is a time corresponding antibody titer measurements dataset, called ANALYZE.sas7bdat. A snap shot of the partial sample dataset is listed below: an treatment age weight interval censor phase time titer 1 Drug 50mg 31 79 389 1 2 383 403.50 2 Placebo 46 63 864 0 5 714 330.51 3 Placebo 51 71 806 1 5 690 339.38 4 Drug 50 mg 34 84 200 1 1 110 256.93 5 Placebo 48 73 217 1 1 208 195.09 6 Drug 50 mg 46 78 455 1 3 331 469.77 7 Drug 50 mg 39 91 492 1 4 291 473.79 The ANALYZE dataset contains nine variables and 100 records. The labels for each variable are as follows: an Subject identification number. treatmnt Treatment groups description. age Subject age weight Subject weight interval Time in the trial censor Censored variable phase Treatment phase time Time in hours corresponding with the titer measurement

2 titer Antibody titer (1) Scatter plot A scatter plot puts ordered pairs in a coordinate plane, showing the correlation between two variables. The scatter plot can be generated using the SAS procedure PROC GPLOT. There are three variables used in the PLOT statement, y-variable "Titer", x-variable "Time", and group-variable "Treatmnt", where ANALYZE is the name of the dataset. goptions reset=all; proc gplot data=anal; Goptions reset=all; proc gplot data= ANALYZE; Note: Graphic option statements have carry-over effects. It is always recommended that option values are reset to default by using 'Goptions reset=all' before any SAS plot procedures. One of most important applications of scatter plots is to explore the relationship of two variables, for example, to fit the regression line with 95% C.I. limit. This can be accomplished by adding INTERPOL=RLCLM option in the SYMBOL statement, as shown below. symbol1 interpol=rlclm; proc gplot data=analyze;

3 (2) Line Plot A line plot looks like a scatter plot in that it plots each individual data point on the graph. The difference is that a line graph connects data points by means of different interpolation methods. By adding the SYMBOL statement with different INTERPOL= option in PROC GPLOT, many useful graphs can be generated. INTERPOL= can also be substituted by letter I=. Our second example with INTERPOL= option is the line plot. symbol1 interpol=join; proc gplot data=analyze; where phase eq 1; Note: Here we subset dataset ANALYZE with phase equal to 1 to decrease data points density, and to better illustrate different interpolation methods. Listed below are some additional options can be used to connect data points. INTERPOL=JOIN connects data points with straight line. INTERPOL=SPLINE connects data points with smooth line. INTERPOL=SM<nn> connects the data points with smooth line in degree of smoothness nn from 0 to 99. The bigger the nn is, the smoother the line. (3)Box Plot A box plot displays numerical data through five summary statistics (minimum, Q1, median, Q3, maximum). It shows the differences between groups by indicating the spread, skewness and outliers of the data. The SAS procedure PROC GPLOT with INTERPOL=BOX option in symbol statement draws the box plot of "titer" in ANALYZE dataset. symbol interpol=boxt; proc gplot data=analyze; plot titer*treatmnt; Notes: OFFSET = (a, b) option may use in AXIS statement for X-axis to enhance the graph. Box plot can also be generated by different SAS procedures, such as PROC UNIVARIATE and PROC BOXPLOT.

4 (4) Mean plot with Bar A variety of error bars around the means can be plotted easily with the INTERPOL option in PROC GPLOT. These include standard error bars, one standard deviation bars, or 95% confidence interval bars, as shown below. The option INTERPOL=HILOTJ requires the outputs of three observations (representing lower, mean, upper) for each time point within a treatment group. The following code shows, in a step by step procedure, how a plot of the mean with standard error bars can be drawn. 1. output a dataset containing mean and stand error for each phase by treatment: ods output 'Summary statistics'=stats; proc sort data=analyze out=temp_; by treatmnt phase; proc means data=temp_ mean std stderr clm; var titer; by treatmnt phase; ods output close; 2. create a dataset containing lower, mean, upper values for each phase by treatment: data plotds (keep=treatmnt phase y); set stats; y=titer_mean; output; y=titer_mean-titer_stderr; output; y=titer_mean+titer_stderr; output; proc sort data=plotds; by treatmnt phase; 3. draw the plot using PROC GPLOT with option INTERPOL=HILOTJ: symbol interpol=hilotj; proc gplot data=plotds; plot y*phase=treatmnt; Note: Users can make other types of error bar plots around mean, including the one standard deviation bar or 95% C.I. bar by changing "titer_stderr" to "titer_stddev" or "titer_lclm and titer_uclm" in step 2.

5 (5) KM plot A Kaplan-Meier (KM) plot is a common method to describe and graph survival characteristics. To calculate the survival function of the data, the input dataset must contain one observation per patient, and have time variables and censor variables. In our sample dataset, the time variable is Interval, and the censor variable is Censor. The KM plot can then be plotted by using the PROC GPLOT with INTERPOL=STEPLJ option in symbol statement. The code is very similar to the one used in the previous example (4). The TEMP dataset will be further described below in detail. symbol i=steplj; proc gplot data=temp; plot survival*interval=treatmnt; The input dataset TEMP in the above code comes from the output of PROC LIFETEST procedure with dataset ANALYZE. The TEMP dataset has the following data structure. Temp dataset from PROC LIFETEST Treatmnt Interval Survival SDF_LCL SDF_UCL Drug 50 mg 0 1.000 1.000 1.000 Drug 50 mg 197 0.980 0.866 0.997 Drug 50 mg Drug 50 mg 1037 0.060 0.016 0.149 Drug 50 mg 1440... Placebo 0 1.000 1.000 1.000 Placebo 204 0.980 0.866 0.997 Placebo Placebo 822 0.120 0.049 0.226 Placebo 1434... proc lifetest data=analyze; by treatmnt; time interval*censor(0); SURVIVAL out=temp; PROC LIFETEST uses TIME statement to run a survival analysis and the temporary dataset Temp is created by the SURVIVAL statement. The dataset Temp contains all data points needed for the KM plot.

6 Conclusion We have shown, in a step by step fashion, how to generate different types of SAS plots by using the PROC GPLOT procedure. Overall, five different examples have been presented by utilizing different INTERPOL options in the SYMBOL statement. The results are summarized in the table below. The plot Symbol INTERPOL = Regression line with 95% CI limits RLCLM Straight line or Smooth line Box plot Line with bars KM plot JOIN or SPLINE, SMnn BOXT HOLOTJ STEPLJ Together with an annotate dataset and the Interpol graph option, users can easily generate customized SAS graphs with PROC GPLOT. References 1. SAS Institute Inc., SAS 9.1.3 Help and Documentation, Cary, NC: SAS Institute Inc., 2000-2004 2. Introduction to SAS/GRAPH Software: http://www.d.umn.edu/math/docs/saspdf/gref/c01.pdf Acknowledgements Authors would like to thank Maryann Williams for taking the time to review and comment on the manuscript. Trademark Information SAS and SAS/GRAPH software are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Contact Author Name: Sheng Zhang Author Name: Xingshu Zhu Company : Merck & Co., Inc. Company : Merck & Co., Inc. Address : 351 Sumneytown Pike Address : 351 Sumneytown Pike City: North Wales City: North Wales State: PA ZIP: 19454 State: PA ZIP: 19454 Email:sheng_zhang@merck.com Email:xingshu_zhu@merck.com