# An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Save this PDF as:

Size: px
Start display at page:

Download "An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA"

## Transcription

1 ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing a preliminary statistical analysis is beneficial. Basic statistical tests can help programmers better understand relationships between variables and notice when data aren't as expected. There is no shortage of resources for the statistician who uses SAS, but resources for the SAS programmer wanting to learn statistics are much more difficult to find. The aim of this paper is to help SAS programmers with no statistical training become comfortable coding and interpreting statistical tests in SAS. This paper discusses the following topics: An explanation of the definition of a statistical test and categorical, ordinal and interval types of variables. A brief discussion of commonly used statistical tests such as t-test, chi square, simple and multiple regression and ANOVA. Examples of how commonly used statistical tests are implemented in SAS, how to code dummy variables and what SAS options are useful. Tips for understanding statistical test output in SAS. Skills required to implement and interpret most basic tests are at an accessible level for most SAS programmers. Because SAS programmers are not statisticians, it may be difficult to know where to start, what to do, or how to interpret the results. INTRODUCTION Statistical tests can help SAS programmers better understand their data. Along with some intuition, statistical tests allow programmers to confirm relationships between variables and to catch mistakes in underlying assumptions about the data. Statistical tests can help programmers determine whether the data characteristics they see are statistically significant (i.e., not likely to be due to random variations). There are many resources for statisticians to better learn SAS, but unfortunately there are far fewer resources for SAS programmers who would like to learn basic statistics. Fortunately the skills required to implement statistical tests and understand their results are not out of grasp for SAS programmers. Understanding that most programmers are not statisticians, this paper aims to explain the most basic statistical concepts before explaining how to use statistical SAS procedures and interpret the results. It is not necessary to understand all of the SAS output from a statistical test to make some use of the results. This paper is geared at helping programmers learn the basics and helping them understand some SAS statistical test output. Most programmers are not statisticians, and obtaining in depth statistical details may not be the best use of their time when knowing the basics will suffice. Before even trying to run a statistical test in SAS, it is necessary to be aware of a few fundamental definitions. BASIC DEFINITIONS HYPOTHESIS TYPES A statistical test is a quantitative way to decide whether there is enough evidence to reasonably believe a conjecture to be true. Often statisticians think of these conjectures as complementing pairs of claims. These claims are usually referred to as the null hypothesis H 0, and the alternative hypothesis H a. Use of the term complementing is meant to imply that for any given situation the null and alternative hypotheses cannot both be true. These claims are not given the same weight; we do not reject the null hypothesis unless there is strong evidence against it. We can only have outcomes of reject H 0 in favor of H a, or do not reject H 0. A few examples of null and alternative hypothesis are the following: H 0: There is no taste difference between diet soda and full calorie soda. H a: There is a taste difference. H 0: Drug A and drug B are equally effective. H a: Drug B is more effective. H 0: The distribution of heights of adult women is normally distributed. H a: The distribution of heights of adult women is not normally distributed. 1

2 Note that the null and alternative hypotheses need not be exhaustive, as in the second example above. Here we implicitly assume that drug A cannot be more effective than drug B. To obtain correct results, it is important to determine whether the hypothesis tests are one or two-tailed. When the null and alternative hypotheses are of the form H 0: x 1= x 2, with H a: x 1> x 2 or H a: x 1< x 2, we call that a one-tailed test, and when the null hypothesis is of the form x 1 x 2, we call that a two tailed test. VARIABLE TYPES Before one is able to perform any statistical tests with variables, it is important to know the nature of variables involved. Each statistical test assumes its variables are of a certain type, and ones not of that type may simply not work. For example, with a variable for favorite color, there is no way to take the average, so a test comparing averages of favorite color would be nonsense. Categorical or nominal variables are ones such as favorite color, which have two or more categories and no way to order the values. Other examples of categorical variables include gender, blood type and favorite ice cream flavor. Ordinal variables can be ordered, but are similar to categorical variables in that there are clear categories. The relative distances or spacing between variables values is not uniform. For example, if we consider a the values of a survey variable: Strongly Disagree, Disagree, Neither Agree or Disagree, Agree, and Strongly Agree, we see that there is a clear order, but cannot speak to the true difference of Agree and Strongly Agree. Other examples of ordinal variables include place in competition or rankings minerals by hardness (Mohs scale of hardness.) Interval variables are similar to ordinal variables, except that values are measured in a way where their differences are meaningful. The place number of runners in a race is considered an ordinal scale, but if we consider the actual times of runners rather than their place, this would be an interval scale. Another example of an interval scale is the Celsius temperature scale. Some statistical tests assume the sample means are of a normal distribution (i.e.,the bell curve). If the sample size is sufficiently large, the central limit theorem guarantees the sample means are normally distributed. T-TESTS WHEN TO USE We can use t-tests in the following three situations; We want to test whether the mean is significantly different than a hypothesized value. We want to test whether means for two independent groups are significantly different. We want to test whether means for dependent or paired groups are significantly different. However, to use a t-test at all, we must have interval variables that are assumed normally distributed. HOW TO IMPLEMENT IN SAS To test whether the mean of one variable is significantly different than a hypothesized value, we can use the following SAS syntax: PROC TTEST DATA= datasetname H0=hypothesizedvalue; VAR variable_of_interest; If we omitted the H0=hypothesizedvalue option, SAS would use the default of H0=0 when running the t-test. In order to test whether the mean of two dependent groups are significantly different, we need to construct the SAS data set in such a way that we have two observations per subject. We use the following slightly different SAS syntax: PROC TTEST DATA= datasetname; PAIRED dependent_variablea*dependent_variableb; 2

4 PROC TTEST DATA= vacation; CLASS Adult; VAR Vlength; Running the above procedure for our fictitious data yields the following SAS results: The TTEST Procedure Variable: Vlength (Vlength) Adult N Mean Std Dev Std Err Minimum Maximum Diff (1-2) Adult Method Mean 95% CL Mean Std Dev 95% CL Std Dev Diff (1-2) Pooled Diff (1-2) Satterthwaite Method Variances DF t Value Pr > t Pooled Equal Satterthwaite Unequal Equality of Variances Method Num DF Den DF F Value Pr > F Folded F Because we have a one-tailed test, and the SAS generated p-value is for a two-tailed test, we need to divide the calculated p-value by 2. Our p-value of indicates the mean vacation times between Adult only families and Adult-children families are not statistically different, so we continue to accept our null hypothesis. ONE-WAY ANOVA WHEN TO USE ANOVA can be thought of as a more generalized version of a t-test. If we compare only two means, both ANOVA and the t-test will yield the same results. Like t-tests, ANOVA requires normal interval variables. The aspect of ANOVA that is different from t-tests is the requirement of an independent categorical variable. We want to use oneway ANOVA when testing to see if the means of the interval dependent variable are different according to the independent categorical variable. HOW TO IMPLEMENT IN SAS There are two common ways to run ANOVA in SAS. A seemingly obvious way is PROC ANOVA, the other is PROC GLM, which has the added advantage of allowing with a few more SAS options. Below we see how we can use either procedure. PROC ANOVA has the following syntax: PROC ANOVA DATA= datasetname; CLASS ClassVariable; MODEL Response_Variable= ClassVariable; MEANS ClassVariable; 4

5 Alternatively, we can use the following syntax for PROC GLM.: PROC GML DATA= datasetname; CLASS ClassVariable; MODEL Response_Variable= ClassVariable; MEANS ClassVariable; There are many, many more options and ways the above SAS code can be elaborated, the above shows a simple way to run a one-way ANOVA in SAS. MEANS is not a required part of the procedure, but is nice to include as it will generate output for the means we re examining. IMPORTANT RESULTS SAS will output many statistical values after running either of the above statements. The most important values for programmers to understand from the SAS output are R 2, f-value, degrees of freedom, and the p-value. R 2 is the percentage of the variance from differences in the means from each category. The R 2 value helps quantify the relationship between the response variable and each of the class variable categories. A low R 2 indicates that there isn t much difference between groups. SAS calculates p-values from the f-value and degrees of freedom. A low p- value (usually p<0.05) is evidence against a null hypothesis. EXAMPLE PROBLEM Still examining the data set vacation, suppose we d like to test the following hypotheses about the average salary for families who took their vacations in different seasons. H 0: There is no difference between the mean salaries of families who vacationed in different seasons. H a: There is a difference between mean salaries of families who vacationed in different seasons. PROC ANOVA DATA= Vacation; CLASS Season; MODEL Salary= Season; MEANS Season; We obtain the following lengthy SAS output (on the next page) after running the above procedure: 5

6 The ANOVA Procedure Dependent Variable: Salary Total Household Salary Sum of Source DF Squares Mean Square F Value Pr > F Model Error E Corrected Total E12 R-Square Coeff Var Root MSE Salary Mean Source DF Anova SS Mean Square F Value Pr > F Vseason Dependent Variable: Salary Total Household Salary Sum of Source DF Squares Mean Square F Value Pr > F Model Error E Corrected Total E12 R-Square Coeff Var Root MSE Salary Mean Source DF Anova SS Mean Square F Value Pr > F Vseason Level of Salary Vseason N Mean Std Dev Fall Spring Summer Winter Because our p-value is (much) greater than.05, we accept our null hypothesis that there is no difference in the mean salary of each household with vacation season. CHI SQUARE GOODNESS OF FIT WHEN TO USE Programmers can use chi square goodness of fit to assess if frequencies of a categorical variable were likely to happen due to chance. Use of a chi square test is necessary whether proportions of a categorical variable are a hypothesized value. 6

7 HOW TO IMPLEMENT IN SAS To implement a chi square test, all we need to do is add the CHISQ option to a frequency procedure. To test whether proportions within a categorical value against a hypothesis, we use the following syntax: PROC FREQ DATA = datasetname; TABLES variable_of_interest / CHISQ TESTP=( ); The TESTP= option is necessary if the programmer would like to specify what proportions the chi square test is testing against. If the TESTP= option is omitted, SAS will assume the proportions within the category are equal. For a categorical variable with 4 possible values, the SAS default would be TESTP=( ). The proportions indicated in the the TESTP= option represent the null hypothesis. IMPORTANT RESULTS In addition to variable frequency results, SAS will output the following chi square specific results due to the CHISQ option : Chi-square, which is a number related to how much observations differ from the expected proportions. A large chi-square value comes from observed proportions quite different than what we expected, many observations in our data set or a combination of both. It is hard to interpret a chi-square value without considering degrees of freedom. Degrees of freedom (DF) are the number of categories in the analyzed variable minus one. P value, which indicates how likely the observed category proportions were to occur from chance alone based on our expected category proportions. A large chi-square value relative to the degrees of freedom indicates the observed category proportions are more drastically different than the expected proportions. When the p-value is less than.05 the null hypothesis is typically rejected, as a p-value less than.05 would correspond to less than 5% chance of rejecting the null hypothesis when it is indeed true. The lower a p-value is, the more significant the results. EXAMPLE PROBLEM Considering the dataset vacation, we d like to test the following hypotheses: H 0: 40% of vacations happened in summer, 25% happened in winter, 20% happened in spring and 15% happened in fall. H a: The percentages of vacations in each season are different than those listed. We can run the following SAS procedure to test these hypotheses, PROC FREQ DATA = vacation; TABLES Season / CHISQ TESTP=( ); And obtain the following SAS results: 7

8 The FREQ Procedure Season Vacation Occurred Test Cumulative Cumulative Vseason Frequency Percent Percent Frequency Percent Fall Spring Summer Winter Chi-Square Test for Specified Proportions Chi-Square DF 3 Pr > ChiSq <.0001 Sample Size = 199 Due to the small size of the p-value (<0.0001), we reject the null hypothesis in favor of the alternative hypothesis. LINEAR REGRESSION WHEN TO USE Simple linear regression is used when one wants to test how well a variable predicts another variable. Multiple linear regression is very similar, but allows one to test how well multiple variables predict a variable of interest. In order to use linear regression, we must be examining normally distributed interval variables. When using multiple linear regression, we additionally assume the predictor variables are independent. HOW TO IMPLEMENT IN SAS Running either or both simple and multiple linear regressions are very straightforward in SAS. Linear regression with one variable takes the following syntax: PROC REG datasetname; MODEL response = predictor / OPTIONS; Multiple linear regression has an approximately the same syntax as the simple linear regression. We simply add additional desired predictor variables in the model line, as so: PROC REG datasetname; MODEL response = predictora predictorb predictorc / OPTIONS; As with ANOVA, PROC GLM can also be used to run a linear regression. IMPORTANT RESULTS In addition to indicating whether or not there s a statistically significant linear relationship between variables, the SAS results will provide slope and intercept values for the best fit line through our data points. For a linear model, we hope the value of R-square is close to 1, as it is a measure of how well the predictor and response variables vary 8

9 together. SAS will list the intercept and slope of the best fit line, regardless of how well the best fit line models the data under a parameter estimates column. We still will use the p-value to tell whether our tests are statistically significant. EXAMPLE PROBLEM Still examining the data set vacation, we can test the following hypotheses with linear regression. H 0: There is no relationship between salary and amount spent on vacation. H a: There is a linear relationship between salary and amount spent on vacation. We can run the following SAS code to run the test, PROC REG vacation; MODEL Vcost = Salary; and ultimately obtain the following results. The REG Procedure Model: MODEL1 Dependent Variable: Vcost Cost of Vacation Number of Observations Read 199 Number of Observations Used 199 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model E10 <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept Salary Total Household Salary E <.0001 Because the p-value is so small, we reject the null hypothesis that there is no relationship between salary and cost of vacation. The parameter estimate shows a positive relationship between salary and vacation cost. From the above results, we also know the best-fit line for our model is: vcost = 0.07(salary)

10 CONCLUSION Statistical tests may initially seem intimidating to SAS programmers with limited formal statistics background. Fortunately, SAS programmers can still benefit from statistical tests with only a basic statistical knowledge. Statistical fundamentals are within the aptitude range of programmers. Use of statistical tests can help programmers learn whether characteristics of their data are based purely on chance or are statistically significant, predict data values for future updates, discover data features, and ultimately help programmers maintain higher data quality standards. REFERENCES Evans, Micheal, and Jeffery Rosenthal. Probability and Statistics The Science of Uncertainty. 2nd ed. New York, NY: W.H. Freeman and Company, ,490-92, , Print. Barlow, R.J. Statistics A Guide to the Use of Statistical Methods in the Physical Sciences. New York, NY: Wiley, Print. Leeper, James. What Statistical Analysis Should I Use? UCLA: Academic Technology Services, Statistical Consulting Group. Web. Aug 2010 < Gerard, Dallal. "Degrees of Freedom." The Little Handbook of Statistical Practice. Web. 3 Sep < ACKNOWLEDGMENTS Special thanks to Nate Derby and Ben Ice for their help reviewing this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Sara Beck Fred Hutchinson Cancer Research Center SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### EXST SAS Lab Lab #9: Two-sample t-tests

EXST700X Lab Spring 014 EXST SAS Lab Lab #9: Two-sample t-tests Objectives 1. Input a CSV file (data set #1) and do a one-tailed two-sample t-test. Input a TXT file (data set #) and do a two-tailed two-sample

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA

Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical

### Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what data look like before more rigorous

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

### Interaction between quantitative predictors

Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Statistics, Data Analysis & Econometrics

Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### SAS 3: Comparing Means

SAS 3: Comparing Means University of Guelph Revised June 2011 Table of Contents SAS Availability... 2 Goals of the workshop... 2 Data for SAS sessions... 3 Statistical Background... 4 T-test... 8 1. Independent

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Let s explore SAS Proc T-Test

Let s explore SAS Proc T-Test Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

### Data Types. 1. Continuous 2. Discrete quantitative 3. Ordinal 4. Nominal. Figure 1

Data Types By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD Resource. Don t let the title scare you.

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### SPSS on two independent samples. Two sample test with proportions. Paired t-test (with more SPSS)

SPSS on two independent samples. Two sample test with proportions. Paired t-test (with more SPSS) State of the course address: The Final exam is Aug 9, 3:30pm 6:30pm in B9201 in the Burnaby Campus. (One

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### Is it statistically significant? The chi-square test

UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data.

1 of 9 12/8/2014 12:57 PM (an On-Line Internet based application) Instructions: Please fill out the information into the form below. Once you have entered your data below, you may select the types of analysis

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

### Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### General Procedure for Hypothesis Test. Five types of statistical analysis. 1. Formulate H 1 and H 0. General Procedure for Hypothesis Test

Five types of statistical analysis General Procedure for Hypothesis Test Descriptive Inferential Differences Associative Predictive What are the characteristics of the respondents? What are the characteristics

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Mind on Statistics. Chapter 15

Mind on Statistics Chapter 15 Section 15.1 1. A student survey was done to study the relationship between class standing (freshman, sophomore, junior, or senior) and major subject (English, Biology, French,

### HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### Random effects and nested models with SAS

Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### RECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS

RECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS Miss Sangeeta Mohanty Assistant Professor, Academy of Business Administration, Angaragadia, Balasore, Orissa, India ABSTRACT Recruitment

### SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...