An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA


 Matilda Bradley
 3 years ago
 Views:
Transcription
1 ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing a preliminary statistical analysis is beneficial. Basic statistical tests can help programmers better understand relationships between variables and notice when data aren't as expected. There is no shortage of resources for the statistician who uses SAS, but resources for the SAS programmer wanting to learn statistics are much more difficult to find. The aim of this paper is to help SAS programmers with no statistical training become comfortable coding and interpreting statistical tests in SAS. This paper discusses the following topics: An explanation of the definition of a statistical test and categorical, ordinal and interval types of variables. A brief discussion of commonly used statistical tests such as ttest, chi square, simple and multiple regression and ANOVA. Examples of how commonly used statistical tests are implemented in SAS, how to code dummy variables and what SAS options are useful. Tips for understanding statistical test output in SAS. Skills required to implement and interpret most basic tests are at an accessible level for most SAS programmers. Because SAS programmers are not statisticians, it may be difficult to know where to start, what to do, or how to interpret the results. INTRODUCTION Statistical tests can help SAS programmers better understand their data. Along with some intuition, statistical tests allow programmers to confirm relationships between variables and to catch mistakes in underlying assumptions about the data. Statistical tests can help programmers determine whether the data characteristics they see are statistically significant (i.e., not likely to be due to random variations). There are many resources for statisticians to better learn SAS, but unfortunately there are far fewer resources for SAS programmers who would like to learn basic statistics. Fortunately the skills required to implement statistical tests and understand their results are not out of grasp for SAS programmers. Understanding that most programmers are not statisticians, this paper aims to explain the most basic statistical concepts before explaining how to use statistical SAS procedures and interpret the results. It is not necessary to understand all of the SAS output from a statistical test to make some use of the results. This paper is geared at helping programmers learn the basics and helping them understand some SAS statistical test output. Most programmers are not statisticians, and obtaining in depth statistical details may not be the best use of their time when knowing the basics will suffice. Before even trying to run a statistical test in SAS, it is necessary to be aware of a few fundamental definitions. BASIC DEFINITIONS HYPOTHESIS TYPES A statistical test is a quantitative way to decide whether there is enough evidence to reasonably believe a conjecture to be true. Often statisticians think of these conjectures as complementing pairs of claims. These claims are usually referred to as the null hypothesis H 0, and the alternative hypothesis H a. Use of the term complementing is meant to imply that for any given situation the null and alternative hypotheses cannot both be true. These claims are not given the same weight; we do not reject the null hypothesis unless there is strong evidence against it. We can only have outcomes of reject H 0 in favor of H a, or do not reject H 0. A few examples of null and alternative hypothesis are the following: H 0: There is no taste difference between diet soda and full calorie soda. H a: There is a taste difference. H 0: Drug A and drug B are equally effective. H a: Drug B is more effective. H 0: The distribution of heights of adult women is normally distributed. H a: The distribution of heights of adult women is not normally distributed. 1
2 Note that the null and alternative hypotheses need not be exhaustive, as in the second example above. Here we implicitly assume that drug A cannot be more effective than drug B. To obtain correct results, it is important to determine whether the hypothesis tests are one or twotailed. When the null and alternative hypotheses are of the form H 0: x 1= x 2, with H a: x 1> x 2 or H a: x 1< x 2, we call that a onetailed test, and when the null hypothesis is of the form x 1 x 2, we call that a two tailed test. VARIABLE TYPES Before one is able to perform any statistical tests with variables, it is important to know the nature of variables involved. Each statistical test assumes its variables are of a certain type, and ones not of that type may simply not work. For example, with a variable for favorite color, there is no way to take the average, so a test comparing averages of favorite color would be nonsense. Categorical or nominal variables are ones such as favorite color, which have two or more categories and no way to order the values. Other examples of categorical variables include gender, blood type and favorite ice cream flavor. Ordinal variables can be ordered, but are similar to categorical variables in that there are clear categories. The relative distances or spacing between variables values is not uniform. For example, if we consider a the values of a survey variable: Strongly Disagree, Disagree, Neither Agree or Disagree, Agree, and Strongly Agree, we see that there is a clear order, but cannot speak to the true difference of Agree and Strongly Agree. Other examples of ordinal variables include place in competition or rankings minerals by hardness (Mohs scale of hardness.) Interval variables are similar to ordinal variables, except that values are measured in a way where their differences are meaningful. The place number of runners in a race is considered an ordinal scale, but if we consider the actual times of runners rather than their place, this would be an interval scale. Another example of an interval scale is the Celsius temperature scale. Some statistical tests assume the sample means are of a normal distribution (i.e.,the bell curve). If the sample size is sufficiently large, the central limit theorem guarantees the sample means are normally distributed. TTESTS WHEN TO USE We can use ttests in the following three situations; We want to test whether the mean is significantly different than a hypothesized value. We want to test whether means for two independent groups are significantly different. We want to test whether means for dependent or paired groups are significantly different. However, to use a ttest at all, we must have interval variables that are assumed normally distributed. HOW TO IMPLEMENT IN SAS To test whether the mean of one variable is significantly different than a hypothesized value, we can use the following SAS syntax: PROC TTEST DATA= datasetname H0=hypothesizedvalue; VAR variable_of_interest; If we omitted the H0=hypothesizedvalue option, SAS would use the default of H0=0 when running the ttest. In order to test whether the mean of two dependent groups are significantly different, we need to construct the SAS data set in such a way that we have two observations per subject. We use the following slightly different SAS syntax: PROC TTEST DATA= datasetname; PAIRED dependent_variablea*dependent_variableb; 2
3 Testing whether the means of two independent groups are different is the most complicated type of ttest. For this type of ttest, we need to create a classification variable or dummy variable. Class variables are 0/1 binary indicator variables. An example of a class variable might be gender, where gender=1 when the observation is male or gender=0 when the observation is female. Another example could be a vital status variable that equals 1 when a person is alive and is 0 when the person is dead. Once a class variable has been created, we use the following SAS syntax to perform the desired ttest. PROC TTEST DATA= datasetname; CLASS classvariable; VAR variable_of_interest; IMPORTANT RESULTS SAS will output other mean, standard deviation and confidence interval information pertaining to our ttest. For the purposed of accepting or rejecting the null hypothesis, we ll direct our attention towards the tvalue, degrees of freedom and pvalue. The pvalue is your goto value for this test. Pvalues indicate how likely the observation means were to occur from chance alone. The tvalue itself is hard to interpret without use of degrees of freedom. In layman s terms, degrees of freedom is a value related to the number of observations and how variability is estimated. A negative or positive sign of the tvalue indicates the observed mean is lower or higher than predicted respectively. SAS uses the tvalue along with degrees of freedom to calculate the pvalue. SAS will display the pvalue result under the column heading: Pr > t, which is the correct pvalue for a twotailed test. For a onetailed test, we simply divide the default SAS p value by 2. Usually people will reject the null hypothesis with a pvalue less than 0.05, though this line is arbitrary. EXAMPLE PROBLEM For the all example problems throughout this paper, we ll consider a fictitious dataset called vacation, containing data collected by a small international airline. Using PROC CONTENTS, we learn more about the data set and view the results below. Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 9 Adult Num 8 All Adult Household 6 ContinentChange Num 8 Vacationed on a Foreign Continent 5 CountryChange Num 8 Vacationed in a Foreign Country 1 Family1D Num 8 Family1D 8 HouseholdSize Num 8 Number of Members in Household 7 Salary Num 8 Total Household Salary 4 Vcontinent Char 13 $13. $13. Continent Vacation Took Place 3 Vcost Num 8 Vacation Cost 2 Vlength Num 8 Length of Vacation 10 Season Char 6 $6. $6. Season Vacation Occurred Let s test to see if the mean length of vacation (VLength) is different between families of only adults and families of adults and children (Adult). We can set up the following pair of hypotheses: H 0: There is no difference between the mean length of vacation between adult families and adultchildren families. H a: The mean vacation length for all adult families is greater than the mean vacation length for adultchildren families. Assuming the data set vacation is in current SAS memory, we can write 3
4 PROC TTEST DATA= vacation; CLASS Adult; VAR Vlength; Running the above procedure for our fictitious data yields the following SAS results: The TTEST Procedure Variable: Vlength (Vlength) Adult N Mean Std Dev Std Err Minimum Maximum Diff (12) Adult Method Mean 95% CL Mean Std Dev 95% CL Std Dev Diff (12) Pooled Diff (12) Satterthwaite Method Variances DF t Value Pr > t Pooled Equal Satterthwaite Unequal Equality of Variances Method Num DF Den DF F Value Pr > F Folded F Because we have a onetailed test, and the SAS generated pvalue is for a twotailed test, we need to divide the calculated pvalue by 2. Our pvalue of indicates the mean vacation times between Adult only families and Adultchildren families are not statistically different, so we continue to accept our null hypothesis. ONEWAY ANOVA WHEN TO USE ANOVA can be thought of as a more generalized version of a ttest. If we compare only two means, both ANOVA and the ttest will yield the same results. Like ttests, ANOVA requires normal interval variables. The aspect of ANOVA that is different from ttests is the requirement of an independent categorical variable. We want to use oneway ANOVA when testing to see if the means of the interval dependent variable are different according to the independent categorical variable. HOW TO IMPLEMENT IN SAS There are two common ways to run ANOVA in SAS. A seemingly obvious way is PROC ANOVA, the other is PROC GLM, which has the added advantage of allowing with a few more SAS options. Below we see how we can use either procedure. PROC ANOVA has the following syntax: PROC ANOVA DATA= datasetname; CLASS ClassVariable; MODEL Response_Variable= ClassVariable; MEANS ClassVariable; 4
5 Alternatively, we can use the following syntax for PROC GLM.: PROC GML DATA= datasetname; CLASS ClassVariable; MODEL Response_Variable= ClassVariable; MEANS ClassVariable; There are many, many more options and ways the above SAS code can be elaborated, the above shows a simple way to run a oneway ANOVA in SAS. MEANS is not a required part of the procedure, but is nice to include as it will generate output for the means we re examining. IMPORTANT RESULTS SAS will output many statistical values after running either of the above statements. The most important values for programmers to understand from the SAS output are R 2, fvalue, degrees of freedom, and the pvalue. R 2 is the percentage of the variance from differences in the means from each category. The R 2 value helps quantify the relationship between the response variable and each of the class variable categories. A low R 2 indicates that there isn t much difference between groups. SAS calculates pvalues from the fvalue and degrees of freedom. A low p value (usually p<0.05) is evidence against a null hypothesis. EXAMPLE PROBLEM Still examining the data set vacation, suppose we d like to test the following hypotheses about the average salary for families who took their vacations in different seasons. H 0: There is no difference between the mean salaries of families who vacationed in different seasons. H a: There is a difference between mean salaries of families who vacationed in different seasons. PROC ANOVA DATA= Vacation; CLASS Season; MODEL Salary= Season; MEANS Season; We obtain the following lengthy SAS output (on the next page) after running the above procedure: 5
6 The ANOVA Procedure Dependent Variable: Salary Total Household Salary Sum of Source DF Squares Mean Square F Value Pr > F Model Error E Corrected Total E12 RSquare Coeff Var Root MSE Salary Mean Source DF Anova SS Mean Square F Value Pr > F Vseason Dependent Variable: Salary Total Household Salary Sum of Source DF Squares Mean Square F Value Pr > F Model Error E Corrected Total E12 RSquare Coeff Var Root MSE Salary Mean Source DF Anova SS Mean Square F Value Pr > F Vseason Level of Salary Vseason N Mean Std Dev Fall Spring Summer Winter Because our pvalue is (much) greater than.05, we accept our null hypothesis that there is no difference in the mean salary of each household with vacation season. CHI SQUARE GOODNESS OF FIT WHEN TO USE Programmers can use chi square goodness of fit to assess if frequencies of a categorical variable were likely to happen due to chance. Use of a chi square test is necessary whether proportions of a categorical variable are a hypothesized value. 6
7 HOW TO IMPLEMENT IN SAS To implement a chi square test, all we need to do is add the CHISQ option to a frequency procedure. To test whether proportions within a categorical value against a hypothesis, we use the following syntax: PROC FREQ DATA = datasetname; TABLES variable_of_interest / CHISQ TESTP=( ); The TESTP= option is necessary if the programmer would like to specify what proportions the chi square test is testing against. If the TESTP= option is omitted, SAS will assume the proportions within the category are equal. For a categorical variable with 4 possible values, the SAS default would be TESTP=( ). The proportions indicated in the the TESTP= option represent the null hypothesis. IMPORTANT RESULTS In addition to variable frequency results, SAS will output the following chi square specific results due to the CHISQ option : Chisquare, which is a number related to how much observations differ from the expected proportions. A large chisquare value comes from observed proportions quite different than what we expected, many observations in our data set or a combination of both. It is hard to interpret a chisquare value without considering degrees of freedom. Degrees of freedom (DF) are the number of categories in the analyzed variable minus one. P value, which indicates how likely the observed category proportions were to occur from chance alone based on our expected category proportions. A large chisquare value relative to the degrees of freedom indicates the observed category proportions are more drastically different than the expected proportions. When the pvalue is less than.05 the null hypothesis is typically rejected, as a pvalue less than.05 would correspond to less than 5% chance of rejecting the null hypothesis when it is indeed true. The lower a pvalue is, the more significant the results. EXAMPLE PROBLEM Considering the dataset vacation, we d like to test the following hypotheses: H 0: 40% of vacations happened in summer, 25% happened in winter, 20% happened in spring and 15% happened in fall. H a: The percentages of vacations in each season are different than those listed. We can run the following SAS procedure to test these hypotheses, PROC FREQ DATA = vacation; TABLES Season / CHISQ TESTP=( ); And obtain the following SAS results: 7
8 The FREQ Procedure Season Vacation Occurred Test Cumulative Cumulative Vseason Frequency Percent Percent Frequency Percent Fall Spring Summer Winter ChiSquare Test for Specified Proportions ChiSquare DF 3 Pr > ChiSq <.0001 Sample Size = 199 Due to the small size of the pvalue (<0.0001), we reject the null hypothesis in favor of the alternative hypothesis. LINEAR REGRESSION WHEN TO USE Simple linear regression is used when one wants to test how well a variable predicts another variable. Multiple linear regression is very similar, but allows one to test how well multiple variables predict a variable of interest. In order to use linear regression, we must be examining normally distributed interval variables. When using multiple linear regression, we additionally assume the predictor variables are independent. HOW TO IMPLEMENT IN SAS Running either or both simple and multiple linear regressions are very straightforward in SAS. Linear regression with one variable takes the following syntax: PROC REG datasetname; MODEL response = predictor / OPTIONS; Multiple linear regression has an approximately the same syntax as the simple linear regression. We simply add additional desired predictor variables in the model line, as so: PROC REG datasetname; MODEL response = predictora predictorb predictorc / OPTIONS; As with ANOVA, PROC GLM can also be used to run a linear regression. IMPORTANT RESULTS In addition to indicating whether or not there s a statistically significant linear relationship between variables, the SAS results will provide slope and intercept values for the best fit line through our data points. For a linear model, we hope the value of Rsquare is close to 1, as it is a measure of how well the predictor and response variables vary 8
9 together. SAS will list the intercept and slope of the best fit line, regardless of how well the best fit line models the data under a parameter estimates column. We still will use the pvalue to tell whether our tests are statistically significant. EXAMPLE PROBLEM Still examining the data set vacation, we can test the following hypotheses with linear regression. H 0: There is no relationship between salary and amount spent on vacation. H a: There is a linear relationship between salary and amount spent on vacation. We can run the following SAS code to run the test, PROC REG vacation; MODEL Vcost = Salary; and ultimately obtain the following results. The REG Procedure Model: MODEL1 Dependent Variable: Vcost Cost of Vacation Number of Observations Read 199 Number of Observations Used 199 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model E10 <.0001 Error Corrected Total Root MSE RSquare Dependent Mean Adj RSq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept Salary Total Household Salary E <.0001 Because the pvalue is so small, we reject the null hypothesis that there is no relationship between salary and cost of vacation. The parameter estimate shows a positive relationship between salary and vacation cost. From the above results, we also know the bestfit line for our model is: vcost = 0.07(salary)
10 CONCLUSION Statistical tests may initially seem intimidating to SAS programmers with limited formal statistics background. Fortunately, SAS programmers can still benefit from statistical tests with only a basic statistical knowledge. Statistical fundamentals are within the aptitude range of programmers. Use of statistical tests can help programmers learn whether characteristics of their data are based purely on chance or are statistically significant, predict data values for future updates, discover data features, and ultimately help programmers maintain higher data quality standards. REFERENCES Evans, Micheal, and Jeffery Rosenthal. Probability and Statistics The Science of Uncertainty. 2nd ed. New York, NY: W.H. Freeman and Company, ,49092, , Print. Barlow, R.J. Statistics A Guide to the Use of Statistical Methods in the Physical Sciences. New York, NY: Wiley, Print. Leeper, James. What Statistical Analysis Should I Use? UCLA: Academic Technology Services, Statistical Consulting Group. Web. Aug 2010 < Gerard, Dallal. "Degrees of Freedom." The Little Handbook of Statistical Practice. Web. 3 Sep < ACKNOWLEDGMENTS Special thanks to Nate Derby and Ben Ice for their help reviewing this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Sara Beck Fred Hutchinson Cancer Research Center SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationEXST SAS Lab Lab #9: Twosample ttests
EXST700X Lab Spring 014 EXST SAS Lab Lab #9: Twosample ttests Objectives 1. Input a CSV file (data set #1) and do a onetailed twosample ttest. Input a TXT file (data set #) and do a twotailed twosample
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationSUGI 29 Statistics and Data Analysis
Paper 19429 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationStatistics for Clinical Trial SAS Programmers 1: paired ttest Kevin Lee, Covance Inc., Conshohocken, PA
Statistics for Clinical Trial SAS Programmers 1: paired ttest Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical
More informationStatistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia
Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and nonstatisticians, knowing what data look like before more rigorous
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationAn Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10 TWOSAMPLE TESTS
The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10 TWOSAMPLE TESTS Practice
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
OneDegreeofFreedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationInferences About Differences Between Means Edpsy 580
Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at UrbanaChampaign Inferences About Differences Between Means Slide
More informationInteraction between quantitative predictors
Interaction between quantitative predictors In a firstorder model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors
More informationOneWay Analysis of Variance (ANOVA) Example Problem
OneWay Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesistesting technique used to test the equality of two or more population (or treatment) means
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationModule 9: Nonparametric Tests. The Applied Research Center
Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } OneSample ChiSquare Test
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationCHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY
CHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationSAS 3: Comparing Means
SAS 3: Comparing Means University of Guelph Revised June 2011 Table of Contents SAS Availability... 2 Goals of the workshop... 2 Data for SAS sessions... 3 Statistical Background... 4 Ttest... 8 1. Independent
More informationStatistical Significance and Bivariate Tests
Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Refamiliarize ourselves with basic statistics ideas: sampling distributions,
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationConsider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.
Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSTATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationLet s explore SAS Proc TTest
Let s explore SAS Proc TTest Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationOneWay Analysis of Variance
OneWay Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationData Types. 1. Continuous 2. Discrete quantitative 3. Ordinal 4. Nominal. Figure 1
Data Types By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD Resource. Don t let the title scare you.
More informationNonparametric TwoSample Tests. Nonparametric Tests. Sign Test
Nonparametric TwoSample Tests Sign test MannWhitney Utest (a.k.a. Wilcoxon twosample test) KolmogorovSmirnov Test Wilcoxon SignedRank Test TukeyDuckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationPredictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 RSq = 0.0% RSq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
More informationIBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
More informationFor example, enter the following data in three COLUMNS in a new View window.
Statistics with Statview  18 Paired ttest A paired ttest compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the
More informationSPSS on two independent samples. Two sample test with proportions. Paired ttest (with more SPSS)
SPSS on two independent samples. Two sample test with proportions. Paired ttest (with more SPSS) State of the course address: The Final exam is Aug 9, 3:30pm 6:30pm in B9201 in the Burnaby Campus. (One
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 42 A Note on NonLinear Relationships 44 Multiple Linear Regression 45 Removal of Variables 48 Independent Samples
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationTesting Group Differences using Ttests, ANOVA, and Nonparametric Measures
Testing Group Differences using Ttests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone:
More informationIs it statistically significant? The chisquare test
UAS Conference Series 2013/14 Is it statistically significant? The chisquare test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chisquare? Tests whether two categorical
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationAdverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data.
1 of 9 12/8/2014 12:57 PM (an OnLine Internet based application) Instructions: Please fill out the information into the form below. Once you have entered your data below, you may select the types of analysis
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationIBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: ChiSquare and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationChi Square Tests. Chapter 10. 10.1 Introduction
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationGeneral Procedure for Hypothesis Test. Five types of statistical analysis. 1. Formulate H 1 and H 0. General Procedure for Hypothesis Test
Five types of statistical analysis General Procedure for Hypothesis Test Descriptive Inferential Differences Associative Predictive What are the characteristics of the respondents? What are the characteristics
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationMind on Statistics. Chapter 15
Mind on Statistics Chapter 15 Section 15.1 1. A student survey was done to study the relationship between class standing (freshman, sophomore, junior, or senior) and major subject (English, Biology, French,
More informationHYPOTHESIS TESTING (ONE SAMPLE)  CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE)  CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationRecommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170
Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More information5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits
Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. HosmerLemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationRandom effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationRECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS
RECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS Miss Sangeeta Mohanty Assistant Professor, Academy of Business Administration, Angaragadia, Balasore, Orissa, India ABSTRACT Recruitment
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationAnalysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
More information