Sample Size Calculation and Power Analysis for Design of Experiments Using Proc GLMPOWER Chii-Dean Joey Lin, SDSU, San Diego, CA

Size: px
Start display at page:

Download "Sample Size Calculation and Power Analysis for Design of Experiments Using Proc GLMPOWER Chii-Dean Joey Lin, SDSU, San Diego, CA"

Transcription

1 Sample Size Calculation and Power Analysis for Design of Experiments Using Proc GLMPOWER Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT How many samples do I need? When a statistician or a quantitative analyst gets involved in a project, this is always one of the first questions to be asked. SAS has developed several tools to address this need. For example, PSS( and sample size), Proc POWER, and Proc GLMPOWER are all available for users to use in various designs. Introductions in PSS and Proc POWER have been presented in many SAS related conferences. However, Proc GLMPOWER is relatively unaware comparing to the other two tools. In this paper, we introduce how one can easily understand the syntax of Proc GLMPOWER for sample size calculation or analysis that are related to design of experiments. In conducting an experimental design, Proc GLM allows one to test for pre-planned contrasts. Similarly, the Proc GLMPOWER can be used to calculate either the sample size or the for any contrast one may have. The related syntax is discussed in this paper. In addition, how to take an advantage of the ful ODS features in Proc GLMPOWER is also addressed. INTRODUCTION In conducting an experiment, sample size determination or analysis is always an important topic to be addressed. It is getting easier for calculating a sample size or determining a nowadays due to innovative developments of statistical software. SAS has developed two procedures (Proc POWER and Proc GLMPOWER) in recent versions. SAS also has an application (PSS) that comes with SAS/STAT product but is installed separately. As introduced in the SAS user manual, Proc POWER covers and sample size analysis for a variety of more basic statistical analyses such as t test, equivalence test, binomial proportions, multiple regression, logistic regression, and some nonparametric tests. Proc GLMPOWER is designed to cover analyses for design of experiments that can be analyzed using Proc GLM. However, tests and contrasts involving random effects are not supported by Proc GLMPOWER. In this paper, we focus our discussions on Proc GLMPOWER. Statistical methods used for sample size and analysis include classical approach (frequentist approach) and Bayesian approach. The Proc GLMPOWER provides analyses through classical sample size calculation. To decide the sample size, one has to know the design and used test method along with information such as desired Type I error rate (α), effect size, standard deviation of the studied response variable and desired (which is equal to 1- Type II error rate (β)). The effect size is the standardized mean difference among treatments we wish to detect in our experiments. In Proc GLMPOWER, the effect size is entered using conjectured treatment means and specified standard deviation of the response. The conjectured treatment means represent the desired possible scientific meaningful difference an investigator is willing to detect. Sample size and analysis related discussions can be found in Hoenig & Heisey (2001), Length (2001), and Littell, et al (2005). In addition, O Brien (2004, 2006) provided excellent concepts and issues related to sample size analysis. O Brien (2006) also proposed the concept of crucial type I and type II error rates. In this paper, we introduce basic syntax of Proc GLMPOWER. Since the GLMPOWER procedure is still an experimental version, the user manual is still relative concise. Only two examples are presented in the manual. Though readers who are familiar with both Proc POWER and Proc GLM will find that this manual is adequate, we provide more experimental design examples for readers to follow. The Proc GLMPOWER can be seen as a combination of Proc GLM (the first part of the syntax) and Proc POWER (the second part of the syntax). The first part tells Proc GLMPOWER what design we want to compute either the or the sample size. Any pre-planned contrast tests can be specified as well. For information needed for the Proc GLMPOWER, it will be better to enter the necessary information through an exemplary data set. The exemplary data provides conjectured treatment means that will be used in Proc GLMPOWER. The available statements for Proc GLMPOWER are listed below. PROC GLMPOWER < options > ; CLASS effects ; MODEL dependent-variable(s) = effect(s) ; CONTRAST label effect values <... effect values > ; WEIGHT variable; POWER < options >; PLOT < plot-options >; 1

2 Note that the blue colored statements are the same as Proc GLM while the red colored statements can be used in Proc POWER. Also note more than one contrast statement can be specified in Proc GLMPOWER. However, the CONTRAST statement must appear after the MODEL statement. Similar to Proc GLM, the CLASS statement has to appear before the MODEL statement. If a continuous (or categorical) variable that is stated in the MODEL statement and is not in the CLASS statement, this variable will be treated as a covariate of the model. The effect of the covariate is identical to an independent variable in a regression model. A model includes both a classification effect (an effect stated in the CLASS statement) and a covariate (a variable specified in the right side of the MODEL statement but not listed in the CLASS statement) is called an Analysis of Covariance (ANCOVA). SAMPLE CODES Assume we have a two-factor factorial design (two-way ANOVA) and there is no interaction between Factor A and Factor B. In this design, we assume Y is the response variable and there are two levels of Factor A and three levels of Factor B (a total of six treatments that is decided by the combination of two levels of A and three levels of B).We are asked to calculate the necessary sample size to test the factor effects (Factor A effect and Factor B effect). In this example, both A & B are classification variables and they should be listed in the CLASS statement. Before we call Proc GLMPOWER, an easier way for entering the means is to create an exemplary data set that includes treatment means (cell means) for each combination of Factor A levels and Factor B levels. Table 1 represents the response means for the combination of Factor A levels and Factor B levels. Table 1. Response means by Factor A and Factor B Factor B Factor A B1 B2 B3 A A Now we enter the response means into a SAS data set. /* exemplary data set */ data aa; input A$ B$ Y; cards; ; To calculate the necessary sample size for detecting Factor A effect and Factor B effect, the following Proc GLMPOWER code is used to serve that purpose. In doing so, we need to provide the standard deviation for the response (assume the standard deviation is 5 in this case) from either a pilot study or from previous studies, the type I error rate (α) (we enter α = for a two-sided test) and a desired (we use 0.9). The program code can be seen in Code #1 and the output is shown in Figure 1. /* Code #1 */ title 'For Code #1'; proc glm data = aa; model y = A B; = 0.9; 2

3 Figure 1: Summary Table for Code #1 Figure 1 shows the summary output for the Proc GLMPOWER. The calculated sample size for testing Factor A effect is 54 and is 30 for testing Factor B effect. Since we did not specify the weight for each treatment, a balanced design (all treatments are assigned same number of sample size) is assumed. That is, for Factor A effect test, each treatment will receive 9 (=54/6) random samples. Similarly, each treatment will receive 6 (= 30/6) random samples for testing Factor B effect. One can see that the actual for Factor B effect test is which is much larger than our desired of 0.9. This is caused by the assumption of a balanced design. The N total will be a multiple of 6 (the number of treatments (2 levels of A X 3 levels of B)). If we allow the study to be close to balanced but not necessary to be exact, we can add an option nfractional in the POWER statement. The following Code #2 demonstrates this adding. /* Code #2 */ title3 'For Code #2'; proc glm data = aa; model y = A B; nfractional = 0.9; With the nfractional option, the output shown below display smaller total sample sizes for both Factor A and Factor B effect tests. For testing Factor A effect, a total of 53 experimental units are needed. This can be allocated by assigning 9 units to each of the 5 randomly selected treatments and assigning 8 units to the rest treatment. The sample reduction of this design is only 1 and 3 for the Factor A and Factor B factor effect tests, respectively. 3

4 Figure 2: Summary Table for Code #2 If we assume that there is an interaction between Factor A and Factor B, we can fit a model including an interaction term between A and B. The Code #3 reflects this scenario and the results are shown below. /* Code #3 */ title3 'For Code #3'; proc glm data = aa; model y = A B A*B; = 0.9; Figure 3: Summary Table for Code #3 4

5 If the interaction effect is interested, a total sample size of 192 is needed (192/6=32 units per treatment). Note that the exemplary data set was generated from a simulation with a model that an interaction effect did not exist. Thus, it will require a much larger sample size to detect small differences among treatment means. To show a curve, we can add a plot statement. See Code #4 and Figure 4 for the curves. /* Code #4 */ title3 'For Code #4'; proc glm data = aa; model y = A B; = 0.9; plot x= min =.7 max =.95; Figure 4: Power Curve for Code #4 A reference line can be added into the plot so that it will be easier to identify the necessary sample size. This can be done by adding ref and crossref options into the plot statement. plot x= min =.7 max =.95 xopts = (ref =.9 crossref = yes ); Note that more than one reference line can be placed. The graph is shown in Figure 5. 5

6 Figure 5: Power Curve With Reference Lines If we want to change the marker symbols so that different curves will have different symbols, we can add vary(symbol) option in the PLOT statement (code not shown). What we have introduced is to calculate sample sizes with a known. If we want to calculate the based on a specified Ntotal, we can use a code like Code #5. /* Code #5 */ title3 'For Code #5'; proc glm data = aa; model y = A B; ntotal = 20, 30 to 40 by 5 =. ; plot x=n min = 18 max =60 ; A calculated for each specified sample size is reported. The table below shows the s under different Ntotals for testing both Factor A effect and Factor B effect. Note that Proc POWER allows one to state either npergroup or ntotal option. However, in Proc GLMPOWER, one can only use the ntotal option but not the npergroup option. 6

7 Figure 6: Summary Table for Code #5 Figure 7: Power Curves Generated by Code #5 If a pre-planned contrast B1 vs. B3 is interested, to decide the necessary sample size for the contrast we can add the following statement into one of the previous codes. contrast 'b1 vs. b3' B 1 0-1; Note that this statement should be stated after the MODEL statement. When more than one tests are conducted, one has to be aware of the overall type I error rate. The specified typical is for a single test with two-sided hypothesis testing. For multiple contrasts, an adjustment of the type I error rate is necessary so that either the family- 7

8 wise error rate or the experiment-wise error rate will be governed at the desired level. One easy but conservative adjustment method is using the Bonferroni adjustment. That is, if there are 4 scientific meaningful contrasts to be answered, the Bonferroni adjustment of /4 will assure the family-wise error rate of 0.05.Other methods such as Holmes and Tukey method can be used for the adjustment. Analysis of Covariance (ANCOVA) In calculating a sample size for the analysis of covariance (ANCOVA), Proc GLMPOWER requires additional information for the correlation between the covariate and the dependent variable. Two options, ncovariates and corrxy, under the need to be added. See Code # 6. /* Code #6 */ title3 'For ANCOVA'; proc glm data = aa; model y = A B; ncovariates = 1 corrxy =.2.4 = 0.9; Note that only one value for the corrxy is needed. In this example, we show the calculated sample sizes when the correlations between Y and the covariate are 0.2 and 0.4. From Figure 8, we can see that if the correlation between Y and the covariate is higher, the needed sample size will be smaller. In addition, the standard deviation used to calculate the sample size is also adjusted due to an inclusion of the covariate. In our example, the adjusted standard deviations are 4.9 and 4.58 for correlations of 0.2 and 0.4, respectively, while the original specified standard deviation is 5. Figure 8: Summary Table for ANCOVA Randomized Complete Block Design (RCBD) 8

9 Similar to other designs, Proc GLMPOWER can be used for the RCBD. Assume an exemplary data set rcbd has been created. We can use the following code to calculate the necessary sample size (B is the block effect and A is the treatment effect). Note that the block effect is not a main focus in general, we can use effects = (A)under POWER statement to show the calculated sample size for the Factor A effect only. The calculated sample size for A is identical to the result without using the effects = (A) statement. /* Code for RCBD */ title3 'For RCBD'; proc glm data = rcbd; class B A; model y = B A; effects = (A) = 0.9; Nested Design In a nested design, we assume Factor B is nested in Factor A. The sample size calculation can be done by using the model statement: model y = A B(A); Note that there is no interaction term in the model statement. In a nested design, an interaction does not exist. CONCLUSION This paper briefly introduces Proc GLMPOWER. We provide sample codes for using Proc GLMPOWER under a variety of situations. Note that the Proc GLMPOWER can be viewed as a combination of both Proc GLM and Proc POWER. Most designs except models involve random effects in proc GLM can use Proc GLMPOWER for sample size calculation or analyses. While considering the sample size calculation, one has to be aware of possible dropouts during the experiment. If a dropout rate exists for a similar study, this information should be incorporated to inflate the required sample size. Please note that Proc GLMPOWER and Proc POWER are designed to provide sample size calculations and prospective analysis to be used in the pre-planned stages of an analysis. They are not used for retrospective analysis. Current version of Proc GLMPOWER does not support for sample size calculation for repeated measure analysis. However, one can use Proc MIXED and a non-central F function to manually calculate the necessary sample size. REFERENCES Hoenig and Heisey (2001). The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55:19-24 Lenth (2001). Some Practical Guidelines for Effective Sample Size Determination. The American Statistician, 55: Little, Milliken, Stroup, Wolfinger, and Schabenberger (2005). SAS for Mixed Models, 2 nd Ed. Gary, NC: SAS Institute Inc. O Brien and Castelloe (2004). Sample-Size Analysis in Study Planning: Concepts and Issues, with Examples Using Proc POWER and Proc GLMPOWER. In Proceedings of the SAS Users Group International (SUGI) Conference, SAS Institute (Gary, NC). O Brien, R. G. and Castelloe, J. (2007), Sample-Size Analysis for Traditional Hypothesis Testing: Concepts and Issues, in Pharmaceutical Statistics Using SAS: A Practical Guide, ed. A. Dmitrienko, C. Chuang-Stein and R. D Agostino, Cary, NC: SAS Institute Inc., Chapter 10,

10 SAS Institute Inc. Introduction to Power and Sample Size Analysis. Cary, NC: SAS Institute Inc. SAS Institute Inc. SAS/STAT 9.2 User s Guide. Cary, NC: SAS Institute Inc. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Dr. Joey Lin Department of Mathematics & Statistics, San Diego State University San Diego, CA cdlin@sciences.sdsu.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities. Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts

More information

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Concepts of Experimental Design

Concepts of Experimental Design Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Topic 9. Factorial Experiments [ST&D Chapter 15]

Topic 9. Factorial Experiments [ST&D Chapter 15] Topic 9. Factorial Experiments [ST&D Chapter 5] 9.. Introduction In earlier times factors were studied one at a time, with separate experiments devoted to each factor. In the factorial approach, the investigator

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation SAS/STAT Introduction (Book Excerpt) 9.2 User s Guide SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual

More information

SPSS Introduction. Yi Li

SPSS Introduction. Yi Li SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf

More information

SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation

SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation SAS/STAT Introduction to 9.2 User s Guide Nonparametric Analysis (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Multiple-Comparison Procedures

Multiple-Comparison Procedures Multiple-Comparison Procedures References A good review of many methods for both parametric and nonparametric multiple comparisons, planned and unplanned, and with some discussion of the philosophical

More information

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data Kathy Welch Center for Statistical Consultation and Research The University of Michigan 1 Background ProcMixed can be used to fit Linear

More information

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

Anyone Can Learn PROC TABULATE

Anyone Can Learn PROC TABULATE Paper 60-27 Anyone Can Learn PROC TABULATE Lauren Haworth, Genentech, Inc., South San Francisco, CA ABSTRACT SAS Software provides hundreds of ways you can analyze your data. You can use the DATA step

More information

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Paper PO06. Randomization in Clinical Trial Studies

Paper PO06. Randomization in Clinical Trial Studies Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection

More information

What Drives the Consumer Price Index? Joshua Klick, Bureau of Labor Statistics, Washington DC

What Drives the Consumer Price Index? Joshua Klick, Bureau of Labor Statistics, Washington DC What Drives the Consumer Price Index? Joshua Klick, Bureau of Labor Statistics, Washington DC ABSTRACT The Consumer Price Index (CPI) is widely referenced as a measure of health for the US economy. Users

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

More information

10. Comparing Means Using Repeated Measures ANOVA

10. Comparing Means Using Repeated Measures ANOVA 10. Comparing Means Using Repeated Measures ANOVA Objectives Calculate repeated measures ANOVAs Calculate effect size Conduct multiple comparisons Graphically illustrate mean differences Repeated measures

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Simple Tricks for Using SPSS for Windows

Simple Tricks for Using SPSS for Windows Simple Tricks for Using SPSS for Windows Chapter 14. Follow-up Tests for the Two-Way Factorial ANOVA The Interaction is Not Significant If you have performed a two-way ANOVA using the General Linear Model,

More information

Analysis of Variance. MINITAB User s Guide 2 3-1

Analysis of Variance. MINITAB User s Guide 2 3-1 3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

THE IMPORTANCE OF TEACHING POWER IN STATISTICAL HYPOTHESIS TESTING 1. Alan Olinsky, Bryant University, (401) 232-6266, aolinsky@bryant.

THE IMPORTANCE OF TEACHING POWER IN STATISTICAL HYPOTHESIS TESTING 1. Alan Olinsky, Bryant University, (401) 232-6266, aolinsky@bryant. THE IMPORTANCE OF TEACHING POWER IN STATISTICAL HYPOTHESIS TESTING 1 Alan Olinsky, Bryant University, (401) 232-6266, aolinsky@bryant.edu * Phyllis Schumacher, Bryant University, (401) 232-6328, pschumac@bryant.edu

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Chapter Eight: Quantitative Methods

Chapter Eight: Quantitative Methods Chapter Eight: Quantitative Methods RESEARCH DESIGN Qualitative, Quantitative, and Mixed Methods Approaches Third Edition John W. Creswell Chapter Outline Defining Surveys and Experiments Components of

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

ii. More than one per Trt*Block -> EXPLORATORY MODEL

ii. More than one per Trt*Block -> EXPLORATORY MODEL 1. What is the experimental unit? WHAT IS RANDOMIZED. Are we taking multiple measures of the SAME e.u.? a. Subsamples AVERAGE FIRST! b. Repeated measures in time 3. Are there blocking variables? a. No

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Chapter 6 INTERVALS Statement. Chapter Table of Contents

Chapter 6 INTERVALS Statement. Chapter Table of Contents Chapter 6 INTERVALS Statement Chapter Table of Contents OVERVIEW...217 GETTING STARTED...218 ComputingStatisticalIntervals...218 ComputingOne-SidedLowerPredictionLimits...220 SYNTAX...222 SummaryofOptions...222

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data. Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

More information

Chapter 12 Nonparametric Tests. Chapter Table of Contents

Chapter 12 Nonparametric Tests. Chapter Table of Contents Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

How to get accurate sample size and power with nquery Advisor R

How to get accurate sample size and power with nquery Advisor R How to get accurate sample size and power with nquery Advisor R Brian Sullivan Statistical Solutions Ltd. ASA Meeting, Chicago, March 2007 Sample Size Two group t-test χ 2 -test Survival Analysis 2 2 Crossover

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

Nursing Journal Toolkit: Critiquing a Quantitative Research Article A Virtual World Consortium: Using Second Life to Facilitate Nursing Journal Clubs Nursing Journal Toolkit: Critiquing a Quantitative Research Article 1. Guidelines for Critiquing a Quantitative Research

More information

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

More information

SAS Analyst for Windows Tutorial

SAS Analyst for Windows Tutorial Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters: : Manipulating Display Parameters in ArcMap Symbolizing Features and Rasters: Data sets that are added to ArcMap a default symbology. The user can change the default symbology for their features (point,

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Design and Analysis of Phase III Clinical Trials

Design and Analysis of Phase III Clinical Trials Cancer Biostatistics Center, Biostatistics Shared Resource, Vanderbilt University School of Medicine June 19, 2008 Outline 1 Phases of Clinical Trials 2 3 4 5 6 Phase I Trials: Safety, Dosage Range, and

More information

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS I. Basic Course Information A. Course Number and Title: MATH 111H Statistics II Honors B. New or Modified Course:

More information

Description. Textbook. Grading. Objective

Description. Textbook. Grading. Objective EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

8. Comparing Means Using One Way ANOVA

8. Comparing Means Using One Way ANOVA 8. Comparing Means Using One Way ANOVA Objectives Calculate a one-way analysis of variance Run various multiple comparisons Calculate measures of effect size A One Way ANOVA is an analysis of variance

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents Table of Contents Introduction to Minitab...2 Example 1 One-Way ANOVA...3 Determining Sample Size in One-way ANOVA...8 Example 2 Two-factor Factorial Design...9 Example 3: Randomized Complete Block Design...14

More information

exspline That: Explaining Geographic Variation in Insurance Pricing

exspline That: Explaining Geographic Variation in Insurance Pricing Paper 8441-2016 exspline That: Explaining Geographic Variation in Insurance Pricing Carol Frigo and Kelsey Osterloo, State Farm Insurance ABSTRACT Generalized linear models (GLMs) are commonly used to

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Box-and-Whisker Plots with The SAS System David Shannon, Amadeus Software Limited

Box-and-Whisker Plots with The SAS System David Shannon, Amadeus Software Limited Box-and-Whisker Plots with The SAS System David Shannon, Amadeus Software Limited Abstract One regularly used graphical method of presenting data is the box-and-whisker plot. Whilst the vast majority of

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information