Skewness of Data, T-Test, and Analysis of Variance

Similar documents
Chapter 2 Probability Topics SPSS T tests

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Descriptive Statistics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Section 13, Part 1 ANOVA. Analysis Of Variance

A full analysis example Multiple correlations Partial correlations

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

One-Way Analysis of Variance (ANOVA) Example Problem

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Multiple samples: Pairwise comparisons and categorical outcomes

Chapter 5 Analysis of variance SPSS Analysis of variance

Linear Models in STATA and ANOVA

ABSORBENCY OF PAPER TOWELS

13: Additional ANOVA Topics. Post hoc Comparisons

Projects Involving Statistics (& SPSS)

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Independent t- Test (Comparing Two Means)

Two-sample hypothesis testing, II /16/2004

Analysis of Variance ANOVA

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Using Excel for inferential statistics

UNDERSTANDING THE TWO-WAY ANOVA

Week 4: Standard Error and Confidence Intervals

2 Sample t-test (unequal sample sizes and unequal variances)

This chapter discusses some of the basic concepts in inferential statistics.

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

HYPOTHESIS TESTING: POWER OF THE TEST

Sample Size and Power in Clinical Trials

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Hypothesis testing - Steps

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

The Mann-Whitney U test. Peter Shaw

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Part 2: Analysis of Relationship Between Two Variables

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

SPSS Guide: Regression Analysis

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

II. DISTRIBUTIONS distribution normal distribution. standard scores

ANOVA ANOVA. Two-Way ANOVA. One-Way ANOVA. When to use ANOVA ANOVA. Analysis of Variance. Chapter 16. A procedure for comparing more than two groups

Testing Research and Statistical Hypotheses

Research Methods & Experimental Design

Regression step-by-step using Microsoft Excel

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

One-Way Analysis of Variance

Rank-Based Non-Parametric Tests

Recall this chart that showed how most of our course would be organized:

Independent samples t-test. Dr. Tom Pierce Radford University

Statistics Review PSY379

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Hypothesis Testing: Two Means, Paired Data, Two Proportions

The Dummy s Guide to Data Analysis Using SPSS

individualdifferences

How to calculate an ANOVA table

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

StatCrunch and Nonparametric Statistics

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Normality Testing in Excel

Simple Linear Regression Inference

Module 5: Multiple Regression Analysis

Measures of Central Tendency and Variability: Summarizing your Data for Others

Hypothesis Testing for Beginners

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

An Introduction to Extreme Programming

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

SPSS Explore procedure

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

A Basic Guide to Analyzing Individual Scores Data with SPSS

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Variables Control Charts

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Lecture Notes Module 1

Course Descriptions for MS degree in Instructional Design and Technology:

Introduction to Regression and Data Analysis

Two Related Samples t Test

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

The assessment of bidding strategy of Iranian construction firm

How To Compare Birds To Other Birds

HOW TO WRITE A LABORATORY REPORT

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

12: Analysis of Variance. Introduction

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Chapter 7 Section 7.1: Inference for the Mean of a Population

Transcription:

Skewness of Data, T-Test, and Analysis of Variance Scribed By: Vincent Ciaramella In this lecture we will go into some detail about a few forms of data analysis. Experiment Explanation Last lecture, we looked at some example data for a hypothetical experiment in which several students drew a circle using either a mouse or a leap motion (gestural) interface. The time taken to complete this task was recorded. The data was recorded into a table whose columns were named as follows: Participant, Interface, Time Taken, and Contact Interface. Being that the software used to analyze this data will not appropriately handle 'leap' vs. 'mouse' we added the 'Contact Interface' column to represent this via 1s and 0s. To clarify the data addition, I will supply an additional example. For an experiment that I am currently working on, we have datasets for different musical instruments. They were previously encoded as piano, guitar, drum etc. and then later correlated to an additional column which would hold piano as the number 0, guitar as 1, etc. This extra (numerical) column will make working with R an easier task. Analysis Explanation The focus of our analysis will be to understand whether the independent variable (type of interface) has an effect on the dependent variable (time taken). Thus, we would like to see if using the mouse led to a slower time taken than when using the leap motion device. There are a few different ways in which we could test for this. The two ways we will cover are T-tests and one way over analysis of variance (ANOVA) Think of these as tools. You should reference a good statistics book to become more familiar, but here we will just cover how to analyze data and use the results. Analysis using R The work below was done in R. The R statements will be prefixed with '> ' while the comments and explanations will be prefixed by '// '. // Clear the screen and any previous data > rm(list=ls())

> ls() // Next we load the data. > file.name<-file.choose() // The previous command will bring about a dialogue used to select the data. // Once a selection is made, the data will be read in. > my.data<-read.csv(file.name,row.names=null) // Now we will view the data using the following command. > my.data // Next we will run a t-test. Last class we found that our means are different. Now we want to see if that difference arose by chance or is a systematic different (meaning that it is a result of the fact that there are two different types of interfaces). The t-test allows us to test the likelihood that a difference of means is due to the dependent variable (or a fluke). // We run a t-test on the third column (time taken) grouped by the second column (interface type) as shown below. > t.test(my.data[,3]~my.data[,2]) // If instead we want to run a paired t-test then we would run the following command. > t.test(my.data[,3]~my.data[,2],paired=true) T-Test Now we will take a moment to look at the t-test results. The p value returned here tells us the probability that the difference in means (in our current data set) is by chance. Basically the p value is the probability that our null hypothesis (that there is no statistically significant difference between the means) is true. For the test data, the p value is 0.001163. This low p value (< 0.05) tells us that we will likely see this change occurring in subsequent experiments. Thus we are confident that there is a

difference in the time taken to do a certain task between these two types of interfaces. The way we want to report this finding would be to say: "There was a significant difference between the type taken on the leap motion controller compared to the mouse. (t(3) = -12.2816, df=3, p- value=0.001163). The mean value for leap = ##, the mean value for mouse = ##." Basically what we were saying is that greater than 95 percent of the time we expect to see a difference of means. Degrees of Freedom Now a note on the t(3) and df=3 from the results above. This is how we express the degrees of freedom, which is a parameter in the t test. From a practitioner s point of view, it is related to the number of participants you had. The reason why you would want to report this is that if you have very large degree of freedom then you would see a statistical difference in the result. So what people want to see is that the degrees of freedom is at an acceptable level for the community. Acceptable p values It is worth noting that these p values may vary across different fields. For example, in medical research the p values are typically higher. In social behavioral research p values that are less than 0.05 are acceptable. Generally the rule of thumb in terms of an acceptable level is dictated by previous research. As an additional example say you are conducting an experiment to build upon previous work. If the previous work utilized a t-test with a specified p value then your work should use either the same or a more strict (smaller) required p value to reject the null. A note on t-tests: There are many flavors of t-tests which you should consult a statistics book in order to identify the most relevant to your experiment. Examples would be paired t-tests (for within subjects), two tailed t-tests, one tailed t-tests, etc. Analysis of Variance (ANOVA) Another method that is equivalent to the t-test is known as an analysis of variance (ANOVA). We will go back to our sample data in R to cover an example of such an analysis.

// Here we run an analysis of variances on the third column ordered by the second column > result = aov(my.data[,3]~my.data[,2]) // This will give us a table as results. Once again we look at the p value (under Pr(>F)) to test a null hypothesis. A p value above a certain threshold will indicate a significant difference. The degrees of freedom for this test are n- 1. With the resulting p value of 0.00637 we can conclude that there is a significant difference between the two means. The way we would report this is: "There is a statistically significant difference in the time taken by a mouse relative to LEAP interface. (F(subscript 1,3)= 16.8, p < 0.05 ). The mean value for leap = ##, the mean value for mouse = ##." If the p value had been greater than 0.05 (or our given threshold) then we could not conclude that a statistically significant difference was shown (between the means). Publishing Results Typically it would be hard to publish a paper that doesn t elicit significant results. What typically happens is you ran a series of results and you explain why you think that happened. The paper is basically saying here is what we did in our lab and here is what we learned from it. By and large it is much easier to publish with significant results. Usually you would iterate your design a few times if you did not get significant results. Higher Independent Variable Levels Now we will cover the case where there are more levels to our independent variable by augmenting our previous example. Once again we go back to R. // Here we cleared the previous data and opened a new set. The new set contains data for a mouse, touchpad, and leap interface > rm(list=ls()) > file.name<- file.choose() > my.data<-read.csv(file.name,row.names=null)

> my.data // This will show the same format as before but with mouse, touchpad, and leap in the interface column. // Now we run the analysis of variances with those three types of gui data present. > result=aov(my.data[,3]~my.data[,2]) > summary(result) Once again our p value is less than 0.05. These results tell us that there is at least one difference between two of the three means where at least one of them is statistically significant. We still don't know which one it is but we know it exists. To report this finding, we say: "There is a significant main effect of interface on time taken.(f(subscript2,22)=16.8, p<0.00738). The mean value for leap = ##, the mean value for mouse = #, the mean value for touchpad is ##". Further analysis will need to be conducted in order to find which of these differences is significant. For this task we will use the Tukey HSD (Honest Significant Differences) function. // Now we use Tukey HSD (Honest Significant Differences). This is what we use when there are more than two conditions. > Tukey HSD(result,ordered=TRUE) // These results will contain a p value for each difference. Analysis with Multiple Independent Variables If the current experiment was augmented in that there were two specific tasks (each of which using all three types of interfaces). These tasks could be writing and drawing. We may want to analyze the effect of two factors at the same time. To do this we would use a 2-way ANOVA analysis. As an additional example consider an experiment where different footwear are being tested. One dependent variable would be the foot ware type (low top versus high top shoe) while another dependent variable (or factor) would be the task (a jumping versus a running task). In order to analyze the effect of the two factors at the same time, we would want to use a two way ANOVA analysis.