Inferential Statistics & Correlational Research. Review/Clarification. Randomized Solomon Four-Group & Posttest Only, Control Group 7/7/2011.

Similar documents
Descriptive Statistics

Research Methods & Experimental Design

II. DISTRIBUTIONS distribution normal distribution. standard scores

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Statistics Review PSY379

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Study Guide for the Final Exam

UNDERSTANDING THE TWO-WAY ANOVA

Using Excel for inferential statistics

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistical tests for SPSS

January 26, 2009 The Faculty Center for Teaching and Learning

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2


SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Section 3 Part 1. Relationships between two numerical variables

Sample Size and Power in Clinical Trials

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Section 13, Part 1 ANOVA. Analysis Of Variance

The Dummy s Guide to Data Analysis Using SPSS

DATA COLLECTION AND ANALYSIS

Additional sources Compilation of sources:

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

UNIVERSITY OF NAIROBI

Projects Involving Statistics (& SPSS)

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistics for Sports Medicine

Biostatistics: Types of Data Analysis

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

CALCULATIONS & STATISTICS

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Introduction to Statistics and Quantitative Research Methods

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Simple Predictive Analytics Curtis Seare

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Introduction to Quantitative Methods

Difference tests (2): nonparametric

Correlation key concepts:

NCSS Statistical Software

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Association Between Variables

Nonparametric Statistics

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

13: Additional ANOVA Topics. Post hoc Comparisons

How To Check For Differences In The One Way Anova

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Come scegliere un test statistico

Non-Parametric Tests (I)

Descriptive Statistics and Measurement Scales

Basic Concepts in Research and Data Analysis

WHAT IS A JOURNAL CLUB?

Statistics. Measurement. Scales of Measurement 7/18/2012

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

THE KRUSKAL WALLLIS TEST

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Chapter Eight: Quantitative Methods

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

How To Run Statistical Tests in Excel

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

2. Simple Linear Regression

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

Normality Testing in Excel

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Correlational Research

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

SPSS Tests for Versions 9 to 13

Chapter G08 Nonparametric Statistics

Analysis of Variance ANOVA

An analysis method for a quantitative outcome and two categorical explanatory variables.

Rank-Based Non-Parametric Tests

Linear Models in STATA and ANOVA

Analyzing Research Data Using Excel

SPSS Explore procedure

Data analysis process

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

Statistiek II. John Nerbonne. October 1, Dept of Information Science

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

NCSS Statistical Software

Module 3: Correlation and Covariance

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Statistics in Medicine Research Lecture Series CSMC Fall 2014

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

COMPARING DATA ANALYSIS TECHNIQUES FOR EVALUATION DESIGNS WITH NON -NORMAL POFULP_TIOKS Elaine S. Jeffers, University of Maryland, Eastern Shore*

Part 2: Analysis of Relationship Between Two Variables

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Statistics, Research, & SPSS: The Basics

Course Syllabus MATH 110 Introduction to Statistics 3 credits

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 7 Section 7.1: Inference for the Mean of a Population

Reporting Statistics in Psychology

Recall this chart that showed how most of our course would be organized:

Transcription:

Inferential Statistics & Correlational Research Day 3 Review/Clarification Randomized Solomon Four-Group & Posttest Only, Control Group Solomon 4 Group controls for possible sensitization effects due to testing or maturation. 1. R O X O 2. R O O (maturation or testing?) 3. R X O (effect of pretest?) 4. R O (control group) What do symbols mean? In a successful experiment, what would we expect for each group? What if the PT scores for group 2 were as high as the PT for group 1? What if the PT scores for group 3 were lower than group 1? 1

2X2 (2 levels of both variables) 2 Way Factorial Designs (2 independent variables [often one manipulated, one attribute) Method CAI Traditional High 1 High 2 Low 1 Low 2 3X2 (3 levels of one variable and 2 levels of other variable) Method CAI Traditional High Achieving 1 High Achieving 2 Moderate Achieving 1 Moderate Achieving 2 Low Achieving 1 Low Achieving 2 What is a 3 way Factorial Design? (p.168) 3X3X3 (3 levels of both variables) Method CAI Traditional Combination High Achieving 1 High Achieving 2 High Achieving 3 Moderate Achieving 1 Moderate Achieving 2 Moderate Achieving 3 Low Achieving 1 Low Achieving 2 Low Achieving 3 Equivalent Materials Design [O Xma O] [O Xmb O] [O Xmc O] ma = teaching method a mb = teaching method b In the Equivalent Materials Design, different, equivalent materials are represented throughout with M a,b,c,d. The treatments and repeats of treatments, X 0,1, are applied and then observed. An extension of this design is the Latin-square (counterbalanced) Design [next slide] Looks for amount of growth & maybe the effects of combinations of treatments and differences May include 2 or more methods/materials Group receives all methods Methods/materials may be repeated Counterbalanced Design (Latin Square) 2

Equiv. Mat. Design: Tuning Drone vs. Tuba vs. Clarinet stimulus band: OX(tu)O OX(drone)O OX(clar)O Pre/Post tests = indiv. tuning to different stimuli. Success mes. w/ a tuner. Most interested w/ growth w/i group per treatment Could subject another band to same procedure w/ different order Also concerned with differences in total growth on last pretest (treatment order) Might drone lead to more growth? Could introducing drone second be more effective? (need to use another group) How do we know that using just one would be just as or more effective? Statistics Basics Types of Data Nominal/Categorical = numbers as labels Male/female Sop/Alto/tenor/bass Ordinal = ranks Contest ratings Interval = equal distance b/w each number Contest scores (1-100) Lack of meaningful zero (0 on test = no knowledge?, 0 temperature = arbitrary) or meaningful ratios (2x as smart?) Ratio = Equal interval data True zero possible (0 decibels, 0 money) Ratios can be calculated in a meaningful way [2x as loud, ½ money, height, weight, depth (a lake can dry up) (?), etc.] 3

Inferential Statistics Statistic = number describing a variable Descriptive statistics = describe population Inferential statistics = used when making inferences about a population based on the sample Stat. used based on type of data and other assumptions Stats used to compare and find differences Two Types of Inferential Stats Parametric Interval & ratio data Normal or near normal curve (distribution) Equal variances (Levin s test) Sample reflects pop. (randomized) Most powerful Non-Parametric Nominal & Ordinal data Not normal distribution (skewness or kurtosis) Unequal variances Less powerful More conservative Statistical Significance Probability that result happened by chance and not due to treatment Expressed as p p <.1 = less than 10% (1/10) probability p <.05 = less than 5% (1/20) probability p <.01 less than 1% (1/100) probability p <.001 less than.1% (1/1000) probability Computer software reports actual p alpha level = probability level to be accepted as significant set b/f study begins Statistical significance does not equal practical significance 4

Statistical Power Likelihood that a particular test of statistical significance will lead to the rejection of null hypothesis Parametric tests more powerful than nonparametric. (Par. more likely to discover differences b/w groups. Choice depend on type of data) The larger the sample size, the more likely you will be to find statistically significant effects. The less stringent your criteria (e.g.,.05 vs. 01 vs. 001), the easier it is to find statistical significance Review-Hypothesis Testing Research Hypothesis (H) The predicted outcome of the research study The chunking method will be more effective than the whole song method in teaching a song by rote Null Hypothesis (Ho) An outcome in which no difference/relationship exists There will be no difference in effectiveness between the chunking and whole song methods (value free) Review-Type I and Type II Error Type I Error is erroneously claiming statistical significance or rejecting the null hypothesis when in fact, it s true (claiming success when experiment failed to produce results) Possible w. incorrect statistical test Or when conducting multiple tests on same data (i.e. comparing 2 groups on multiple variables (achievement test parts). [solution, lower alpha level] Type II Error is when a researcher fails to reject the null hypothesis when it is in fact false The smaller the sample size, the more difficult it is to detect statistical significance In this case, a researcher could be missing an important finding because of study design 5

Statistical Tests - Parametric http://faculty.vassar.edu/lowry/vassarstats.html Parametric Assumptions Interval Data Normality - Scores are normally distributed in each group Homogeneity of Variance - The amount of variability in scores is similar between each group (Levin s test) One- vs. Two-Tailed Tests If a hypothesis is directional in nature it is one-tailed The chunking method will be more effective than the whole song method If a hypothesis is not directional in nature it is two-tailed There will be a no difference in effectiveness between the chunking method and the whole song method Two-tailed tests are most commonly used since specific hypotheses are rare in music education research. If study is designed knowing that results can only go one direction (e.g., beginning violin), a one tail test is OK. If treatment can only lead to positive results (improvement) use a one tail test. If treatment could result in positive or negative results, use a two tail test. One Tailed test more powerful. If your experiment led to improvement but a two tail test only comes close to significance, try a one tail test. (specify which you used in your study) 6

Independent Samples t-test Used to determine whether differences between two independent group means are statistically significant n = < 30 for each group. Though many researchers have used the t test with larger groups. Groups do not have to be even. Only concerned with overall group differences w/o considering pairs [A robust statistical technique is one that performs well even if its assumptions are somewhat violated by the true model from which the data were generated. Unequal variances = alternative t test or better Mann-Whitney U] Correlated (paired, dependent) Samples t-test Used to determine whether differences between two means taken from the same group, or from two groups with matched pairs are statistically significant e.g., pre-test achievement scores for the whole song group vs. post-test achievement scores for the whole song group Group size must be even (paired) N = < 30 for each group ANOVA Analyze means of 2+ groups Homogeneity of variance Independent or correlated (paired) groups More rigorous than t-test (b/w group & w/i group variance). Often used today instead of T test. F statistic One-Way = 1 independent variable Two-Way/Three-Way = 2-3 independent variables (one active & one or two an attribute) 7

One-Way ANOVA Calculate a One-Way ANOVA for data-set Post Hoc tests Used to find differences b/w groups using one test. You could compare all pairs w/ individual t tests or ANOVA, but leads to problems w/ multiple comparisons on same data Bonferroni correction reduce alpha to compensate for multiple comparisons. (.05/N comparisons) Tukey Equal Sample Sizes (though can be used for unequal sample sizes as well) Sheffe Unequal Sample Sizes (though can be used for equal sample sizes as well) Two Way ANOVA (2X2) Level CAI Method Traditional High Test Test Low Test Test Interpreting Results of 2x2 ANOVA (columns) CAI was more effective than Traditional methods for both high and low achieving students (rows) High Achieving students scored significantly higher than Low achieving students, regardless of teaching method There was no significant interaction between rows & columns If there was significant interaction, we would need to do post hoc Tukey or Sheffe do determine where the differences lie. 8

Three Way (2x2x2) ANOVA Starting Grade 4 (B1) 5 (B2) Girls (A1) Lessons No Lessons (C1) (C2) Boys (A2) Lessons No Lessons (C1) (C2) ANCOVA Analysis of Covariance Statistical control for unequal groups Adjusts posttest means based on pretest means. [example] http://faculty.vassar.edu/lowry/vassarstats.html [The homogeneity of regression assumption is met if within each of the groups there is an linear correlation between the dependent variable and the covariate and the correlations are similar b/w groups] Effect Size (Cohen s d) http://www.uccs.edu/~faculty/lbecker/es.htm [Mean of Experimental group Mean of Control group/average SD] The average percentile standing of the average treated (or experimental) participant relative to the average untreated (or control) participant. Use table to find where someone ranked in the 50 th percentile in the experimental group would be in the control group Good for showing practical significance When test in non-significant When both groups got significantly better (really effective vs. really really effective! Calculate effect size: Treatment group: M=24.6; SD=10.7 Control Group: M=10.8; SD=7.77 9

Correlational Research Correlational Research Basics Relationships among two or more variables are investigated The researcher does not manipulate the variables Direction (positive [+] or negative [-]) and degree (how strong) in which two or more variables are related Uses of Correlational Research Clarifying and understanding important phenomena (relationship b/w variables e.g., height and voice range in MS boys) Explaining human behaviors (class periods per weeks correlated to practice time) Predicting likely outcomes (one test predicts another) 10

Uses of Correlation Research Particularly beneficial when experimental studies are difficult or impossible to design Allows for examinations of relationships among variables measured in different units (decibels, pitch; retention numbers and test scores, etc.) DOES NOT indicate causation Reciprocal effect (a change in weight may affect body image, but body image does not cause a change in weight) Third (other) variable actually responsible for difference (Tendency of smart kids to persist in music is cause of higher SATs among HS music students rather than music study itself) Interpreting Correlations r Correlation coefficient (Pearson, Spearman) Can range from -1.00 to +1.00 Direction Positive As X increases, so does Y and vice versa Negative As X decreases, Y increases and vice versa Degree or Strength (rough indicators) < +.30; small < +.65; moderate > +.65; strong > +.85; very strong r 2 (% of shared variance) Overlap b/w two variable percent of the variation in one variable that is related to the variation in the other. Easy to obtain significant results w/ correlation. Strength is most important Interpreting Correlations (cont.) Words typically used to describe correlations Direct (Large values w/ large values or small values w/ small values. Moving parallel. 0 to +1 Indirect or inverse (Large values w/small values. Moving in opposite directions. 0 to -1 Perfect (exactly 1 or -1) Strong, weak High, moderate, low Positive, Negative Correlations vs. Mean Differences Groups of scores that are correlated will not necessarily have similar means 11

Statistical Assumptions The mathematical formula used to determine various correlation coefficients carry with them certain assumptions about the nature of the data used Independence (all) Level of data (all) Normal curve (Pearson, if not-spearman) Linearity (relationships move parallel or inverse) Young students initially have a low level of performance anxiety, but it rises with each performance as they realize the pressure and potential rewards that come with performance. However, once they have several performances under their belts, the anxiety subsides. (non linear relationship of # of performances & anxiety scores) Presence of outliers (all) Homoscedascity relationship consistent throughout anxiety levels off after several performances and remains static (relationship lacks Homoscedascity) Subjects have only one score for each variable Minimum Sample size needed for significance. Can be calculated using power analysis (Pearson) Correlational Approaches for Assessing Measurement Reliability Consistency over time test-retest (Pearson, Spearman) Consistency within the measure internal consistency (split-half, KR-20, Cronbach s alpha) Spearman Brown Prophecy formula 2r/(1 + r) Among judges Interjudge (Cronbach s Alpha) Consistency across forms of a measure (Pearson, Spearman) Degree and Strength r =.60 to.70 indicates marginal reliability r =.70 to.85 indicates acceptable/good reliability r =.85 to.99 indicates excellent reliability Multiple Regression Can be used to predict how much independent variables can predict a dependent variable 12

For Monday Turn in first draft of Literature review and Reference list so far. APA format Everything double spaced, 1 inch margins, 12 pt. font Reference list in alphabetical order by first author Use online samples as a guide 13