SAS Programmer s check list-quick checks to be done before the statistical reports go off the SAS Programmer s table.
|
|
- Noel Chase
- 7 years ago
- Views:
Transcription
1 PharmaSUG Paper IB11 SAS Programmer s check list-quick checks to be done before the statistical reports go off the SAS Programmer s table. Thomas T. Joseph, Independent Consultant Babruvahan Hottengada, Independent Consultant ABSTRACT The statistical output looks absurd is not the coolest thing to hear. If the data is clean and if the output looks absurd, then it is a programming issue because SAS/STAT procedures can do no wrong. Our experiences as programmers have encouraged us to write down this check list while doing statistical programming. The essence of this paper is to do primary checks to avoid programming errors while statistical reporting in clinical trials before QC and review by the statistical team. The few procedures and checks which we discuss below do not require a very sound mathematical or statistical knowledge, but instead can be done on your finger tips or using a calculator. The later part of the paper discusses the pitfalls that could occur during pooling of data for safety and efficacy reporting. INTRODUCTION Many programmers especially those with non-statistical backgrounds make the mistake of passing on the reports generated from SAS procedures without looking at the numbers generated in the output. This can result in correct information being displayed incorrectly, or incorrect information being displayed incorrectly. This paper will highlight common mistakes made while programming statistical reports and then discusses some quick checks after programming that can be done without even writing code to check the values. The focus of this paper is primarily SAS/STAT procedures generating LS means estimates, LS means difference, confidence intervals/limits, and odds ratio, and as well as checks on datasets prior to pooling. There are several methods used to generate LS means estimate, LS means difference, confidence intervals, and similarly for odds ratio, but in this paper we will be considering a single SAS/STAT procedure (GLM) for LS means, LS means difference, confidence intervals. Similarly, we will be using PROC TTEST to generate degrees of freedom and confidence intervals, and PROC FREQ to generate odds ratio statistic. The checks that will be discussed in this paper can be applied to other SAS/STAT procedures that generate the very same statistic. CHECKS ON LS MEANS ESTIMATE, CONFIDENCE INTERVALS, AND LS MEANS DIFFERENCE Least Squares s can be defined as a linear combination (sum) of the estimated effects (means, etc) from a linear model. These means are based on the model used. LSMEANS statement in SAS procedures are sometimes used when a covariate(s) appears in the model such as in ANCOVA [1]. Proc GLM, PROC MIXED, PROC GENMOD are some of the SAS procedures that generate Least Square means. Confidence interval is another common statistic generated by SAS/STAT procedures. Confidence interval is defined as the interval in which a value of particular parameter falls in. A confidence interval with 95% means 95 percent of the time the value of the parameter can be found within that confidence interval. Below is an example of analysis of covariance done using the PROC GLM model where baseline score, treatment arm, and locations are the independent variables and mscore is the dependent variable. s ptid, trtarm, baseline, mscore, and location contain information on patient identifiers, treatment arms, baseline score, final score, and location respectively. Using SAS/ODS the results of the analysis is fed into datasets and finally the desired the datasets are merged to produce an output similar to Figure 1 below. The following is the data step and code of analysis done using PROC GLM, and output: data class; input ptid trtarm $ baseline mscore location $; datalines; 1001 A AZ 1002 A AK 1003 A KA 1004 A KA 1005 A KA 1006 A KA 1007 A AZ 1001 B AK 1002 B AK 1003 B AZ 1004 B AZ
2 ; ods trace on; ods output GLM.LSMEANS.trtarm.mscore.LSDiffCL=diffcl; ods output GLM.LSMEANS.trtarm.mscore.LSCL=meancl; ods output GLM.LSMEANS.trtarm.mscore.LSs=lsmeans; proc glm data=class; format trtarm $15.; class trtarm location; Model Mscore=trtarm baseline location; lsmeans trtarm/ pdiff cl; quit; ods trace close; data diffcl; set diffcl; if i=1 then trtarm ='A'; data statistics; merge meancl(in=a keep=trtarm lowercl lsmean uppercl trtarm dependent effect) diffcl(in=b keep=effect dependent uppercl lowercl difference trtarm); by effect trtarm; proc print data=statistics; The errors related to confidence interval can easily be detected without writing a quality check code. Checking whether LS means estimate falls within the upper and lower confidence limits and checking whether the difference of LS means equals the difference could help in identifying the programming error. Figure 1 displays the output of the print procedure while Figure 2 displays the results of the PROC GLM used. The SAS System Figure 1 Effect Dependent trtarm LowerCL LS UpperCL Difference trtarm mscore A trtarm mscore B The output shown in Figure 1 has an apparent error- LS mean estimate for each trtarm A does not fall within the upper and lower confidence limits and so the result is erroneous, which has happened as a result of not renaming the variables during merging to generate the statistics dataset.from Figure 1, Difference of the LS means = Using the following equation, check for LS means difference of effect trtarm is possible: LS s Difference for effect trtarm = LS of A LS of B= = Here in this case, the difference of means by subtraction equals the values displayed in Figure 1 and 2. Erroneous results for difference of LS means could also arise from programming errors.
3 Figure 2 The GLM Procedure Least Squares s H0:LS1=LS2 trtarm mscore LSMEAN Pr > t A B trtarm mscore LSMEAN 95% Confidence Limits A B Least Squares s for Effect trtarm i j Difference Between s 95% Confidence Limits for LS(i)-LS(j) So, checks that could be done on LS means, CI, LS means differences results are as follows: 1. The estimate of LS s should lie between CI of LS means. 2. The difference of the estimate of LS s is LS s difference statistic. In Figure 2, the mscore LS statistics and LS means difference have been highlighted. 3. Keep an eye for the variable names especially when output delivery system is used to generate datasets. CHECKS FOR DEGREES OF FREEDOM, CONFIDENCE INTERVALS 1 sample t-test is usually done using PROC MEANS, PROC UNIVARIATE, and PROC TTEST. Some of the statistics of interest generated by these SAS/STAT procedures are degrees of freedom, confidence intervals.degrees of Freedom is usually N-1, where N=number of observations. Confidence intervals can be calculated using, where N is the sample size, s is the standard deviation, t is the values at alpha confidence and N-1 sample size, and Y is the mean. It should be noted that with decreasing sample the confidence intervals tends to be wider. The following dataset and code is used to show the variation of the confidence intervals with respect to the number of observations or degrees of freedom: data class; input ptid cycle value; datalines ; ;
4 proc ttest data=class; by cycle; var value; Figure 3 The SAS System The TTEST Procedure cycle=1 Statistics N Lower CL Upper CL Lower CL Upper CL Std Err Minimum Maximum Value T-Tests D F t Value Pr > t value <.0001 cycle=2 Statistics N Lower CL Upper CL Lower CL Upper CL Std Err Minimum Maximum Value T-Tests D F t Value Pr > t value Degrees of freedom (at cycle 1) = Number of observations-1=9-1 =8 (highlighted in Figure 3). If you notice closely, you will see that with decreasing degrees of freedom or decreasing number of observations the width of the confidence intervals increases but this is not always the case, as it is data driven. An easy check for the mean statistic and confidence interval/limits while generating the final report is to look whether the mean statistic falls within the confidence interval/limits. CHECKS FOR ODDS RATIO Odds ratio is another statistic widely used to do comparison between the treatment arms in the pharmaceutical industry. As the name suggests, odds ratio is the ratio of the odds of an event happening in a particular group to the odds of it happening in a different group or simply defined as the ratio of odds [2]. Treatment A Treatment B Total N Na Nb Event na nb Odds ratio= Odds of an event happening in treatment A/Odds of the same event happening in treatment B = (na/(na-na))/(nb/(nb-nb))
5 Below is an example where odds ratio output has been generated. Here in this example, the odds ratio of an event happening in a particular treatment with sorting order variable (whose variable name is s_order) is generated. Also the odds ratio of another event (whose variable name is secevent) happening is also calculated. The purpose of this example is to show how the odds ratio can change with changing values of the events. data class; input ptid s_order value $ event secevent; datalines; A A A B B A A B B A A A 1 1 ; proc freq data=class; tables s_order*event/ norow nocol nopercent alpha=0.05 cmh ; The FREQ Procedure Figure 4 Frequency Table of s_order by event event s_order 0 1 Total Total Summary Statistics for s_order by event Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob 1 Nonzero Correlation Row Scores Differ General Association Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits Case-Control Mantel-Haenszel (Odds Ratio) Logit Cohort Mantel-Haenszel
6 (Col1 Risk) Logit Cohort Mantel-Haenszel (Col2 Risk) Logit Total Sample Size = 12 In Figure 4, the odds ratio generated is 0.6 (highlighted in Figure 4), which is the ratio of the event 0 happening in s_order 1 to event 0 happening in s_order 2. Odds ratio generated in Figure 4 is as follows: (na/ (Na-na))/ (nb/ (Nb-nb)) = ((na/na)/ (1-(na/Na)) / ((nb/nb)/ (1-(nb/Nb)) = ((2/7)/ (5/7)) / ((2/5)/ (3/5)) = 0.6 Since the values of variable secevent are slightly different from variable event in the dataset class, the outputs for odds ratio have drastically changed. The following code generates the odds ratio of event 1 happening in group (s_order 1) to event 1 happening in (s_order 2) and tells how significantly the values of odds ratio have changed. proc freq data=class; tables s_order*secevent/ norow nocol nopercent alpha=0.05 cmh ; The FREQ Procedure Figure 5 Frequency Table of s_order by secevent secevent s_order 1 2 Total Total Summary Statistics for s_order by secevent Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob 1 Nonzero Correlation Row Scores Differ General Association Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits Case-Control Mantel-Haenszel (Odds Ratio) Logit Cohort Mantel-Haenszel (Col1 Risk) Logit
7 Cohort Mantel-Haenszel (Col2 Risk) Logit Total Sample Size = 12 In Figure 5, the odds ratio generated is (highlighted in Figure 5), which is the ratio of the event 1 happening in s_order 1 to event 1 happening in s_order 2. So, odds ratio from Figure 5 is as follows: (na/ (Na-na))/ (nb/ (Nb-nb)) = ((na/na)/ (1-(na/Na)) / ((nb/nb)/ (1-(nb/Nb)) = ((5/7)/ (2/7)) / ((3/5)/ (2/5)) = 2.5/1.5 = The outputs generated in both cases are correct, but ignoring what output is generated and what is to be displayed could result in passing the report with the wrong information, and finally result in wrong inferences. So, we suggest it is a good idea to calculate manually and re-check what the values generated and displayed. CHECKS BEFORE POOLING DATASETS Special attention has to be given while pooling datasets because of different attributes, different dictionaries used, and different units of collecting information in different datasets. If proper attention is not paid, the pooled dataset produced can produce absurd results. Some of the checks we do before pooling datasets are as follows: 1. Check for consistency of attributes-lengths, formats, and types in datasets. Consistency of attributes could avoid erroneous results like having records with truncated information because of varying lengths, misrepresented information because of different formats. 2. Standardized of data is strongly recommended before pooling the datasets. This process can produce results with lesser errors. 3. Different versions of WHO Drug and MedDRA dictionaries can result in varying categorization of preferred text for adverse events, concomitant medications etc. So it is recommended to use the same or the latest version of the dictionaries. CONCLUSION The checks discussed in this paper are precautions to be kept in mind while programming. These checks can reduce the chances of producing erroneous results. Only a handful of statistics, checks related to those statistics, and precautions while pooling of datasets have been discussed in this paper. The topic in itself is very broad in nature and could be extended to other statistical procedures widely used in the pharmaceutical industry and also beyond the pharmaceutical industry. REFERENCES [1] [2] Proc freq: It s more than counts, Richard Severino, The Queen s Medical Center, Honolulu, HI ACKNOWLEDGEMENTS I would like to thank Carol Mathews for reviewing this paper and providing excellent comments. CONTACT INFORMATION Thomas Joseph Independent Consultant New London, CT thomast.joseph@yahoo.com Babruvahan Hottengada Independent Consultant New London, CT babru.hottengada@gmail.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.
Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.
Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution
More informationGuido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationSIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More informationln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
More informationSP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationOne-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationChapter 12 Nonparametric Tests. Chapter Table of Contents
Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationSAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation
SAS/STAT Introduction to 9.2 User s Guide Nonparametric Analysis (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationConfidence Intervals for Cp
Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationSensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationSAS CLINICAL TRAINING
SAS CLINICAL TRAINING Presented By 3S Business Corporation Inc www.3sbc.com Call us at : 281-823-9222 Mail us at : info@3sbc.com Table of Contents S.No TOPICS 1 Introduction to Clinical Trials 2 Introduction
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationPharmaSUG 2013 - Paper IB05
PharmaSUG 2013 - Paper IB05 The Value of an Advanced Degree in Statistics as a Clinical Statistical SAS Programmer Mark Matthews, inventiv Health Clinical, Indianapolis, IN Ying (Evelyn) Guo, PAREXEL International,
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationAn Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA
ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationCLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.
CLUSTER ANALYSIS Introduction Cluster analysis is a technique for grouping individuals or objects hierarchically into unknown groups suggested by the data. Cluster analysis can be considered an alternative
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationTesting Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
More informationMultivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine
2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationIndependent t- Test (Comparing Two Means)
Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent
More informationCan SAS Enterprise Guide do all of that, with no programming required? Yes, it can.
SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants
More informationAbbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4
1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki
More informationTraining/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program
Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program Domain Clinical Data Sciences Private Limited 8-2-611/1/2, Road No 11, Banjara Hills, Hyderabad Andhra Pradesh
More informationPaper PO06. Randomization in Clinical Trial Studies
Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection
More informationSurvey Analysis: Options for Missing Data
Survey Analysis: Options for Missing Data Paul Gorrell, Social & Scientific Systems, Inc., Silver Spring, MD Abstract A common situation researchers working with survey data face is the analysis of missing
More informationAnswering Your Research Questions with Descriptive Statistics Diana Suhr, University of Northern Colorado
Answering Your Research Questions with Descriptive Statistics Diana Suhr, University of Northern Colorado Abstract Projects include a plan, outcomes, and measuring results. In order to determine the outcomes
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationConfidence Intervals for Cpk
Chapter 297 Confidence Intervals for Cpk Introduction This routine calculates the sample size needed to obtain a specified width of a Cpk confidence interval at a stated confidence level. Cpk is a process
More informationChapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or
1 Chapter 7 Comparing Means in SPSS (t-tests) This section covers procedures for testing the differences between two means using the SPSS Compare Means analyses. Specifically, we demonstrate procedures
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
More informationStatistics and Pharmacokinetics in Clinical Pharmacology Studies
Paper ST03 Statistics and Pharmacokinetics in Clinical Pharmacology Studies ABSTRACT Amy Newlands, GlaxoSmithKline, Greenford UK The aim of this presentation is to show how we use statistics and pharmacokinetics
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationCoefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationPROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY
PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationExperimental Design for Influential Factors of Rates on Massive Open Online Courses
Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationCommon Univariate and Bivariate Applications of the Chi-square Distribution
Common Univariate and Bivariate Applications of the Chi-square Distribution The probability density function defining the chi-square distribution is given in the chapter on Chi-square in Howell's text.
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationBridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission
, October 24-26, 2012, San Francisco, USA Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission Abstract In this article, the relationship between the Statistical Analysis Plan
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationUNDERSTANDING THE DEPENDENT-SAMPLES t TEST
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationChapter 2 Probability Topics SPSS T tests
Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationThis can be useful to temporarily deactivate programming segments without actually deleting the statements.
EXST 700X SAS Programming Tips Page 1 SAS Statements: All SAS statements end with a semicolon, ";". A statement may occur on one line, or run across several lines. Several statements can also be placed
More information2. Making example missing-value datasets: MCAR, MAR, and MNAR
Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data
More informationFamily economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationBuilding and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA
WUSS2015 Paper 84 Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA ABSTRACT Creating your own SAS application to perform CDISC
More informationQuick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7
Chapter 1 Introduction 1 SAS: The Complete Research Tool 1 Objectives 2 A Note About Syntax and Examples 2 Syntax 2 Examples 3 Organization 4 Chapter by Chapter 4 What This Book Is Not 5 Chapter 2 SAS
More informationDescriptive Analysis
Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,
More information6 Variables: PD MF MA K IAH SBS
options pageno=min nodate formdlim='-'; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354-366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationHypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationMultivariate Analysis of Variance (MANOVA)
Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationSalary. Cumulative Frequency
HW01 Answering the Right Question with the Right PROC Carrie Mariner, Afton-Royal Training & Consulting, Richmond, VA ABSTRACT When your boss comes to you and says "I need this report by tomorrow!" do
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More information