Point Biserial Correlation Tests

Similar documents
Pearson's Correlation Tests

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Tests for Two Proportions

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Tests for One Proportion

Non-Inferiority Tests for One Mean

Non-Inferiority Tests for Two Means using Differences

HYPOTHESIS TESTING: POWER OF THE TEST

Non-Inferiority Tests for Two Proportions

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for Cp

Confidence Intervals for One Standard Deviation Using Standard Deviation

Hypothesis testing - Steps

Confidence Intervals for Cpk

Confidence Intervals for Spearman s Rank Correlation

Using Excel for inferential statistics

Two Correlated Proportions (McNemar Test)

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Randomized Block Analysis of Variance

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

NCSS Statistical Software

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Confidence Intervals for Exponential Reliability

Binary Diagnostic Tests Two Independent Samples

Lin s Concordance Correlation Coefficient

Projects Involving Statistics (& SPSS)

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Data Analysis Tools. Tools for Summarizing Data

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Gamma Distribution Fitting

Chapter 3 RANDOM VARIATE GENERATION

Simple Linear Regression Inference

NCSS Statistical Software

NCSS Statistical Software. One-Sample T-Test

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Univariate Regression

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Chapter 23. Inferences for Regression

Part 2: Analysis of Relationship Between Two Variables

Module 4 (Effect of Alcohol on Worms): Data Analysis

Regression and Correlation

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Simple Regression Theory II 2010 Samuel L. Baker

3.4 Statistical inference for 2 populations based on two samples

Using Microsoft Excel to Plot and Analyze Kinetic Data

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Chapter 5 Analysis of variance SPSS Analysis of variance

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Testing for differences I exercises with SPSS

Probability Calculator

Homework 11. Part 1. Name: Score: / null

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Regression Analysis: A Complete Example

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Multivariate Analysis of Variance (MANOVA)

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

SPSS Guide: Regression Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Paired 2 Sample t-test

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Two Related Samples t Test

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Principles of Hypothesis Testing for Public Health

Guide to Microsoft Excel for calculations, statistics, and plotting data

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Using Excel in Research. Hui Bian Office for Faculty Excellence

UNDERSTANDING THE TWO-WAY ANOVA

17. SIMPLE LINEAR REGRESSION II

SPSS Tests for Versions 9 to 13

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

January 26, 2009 The Faculty Center for Teaching and Learning

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

How To Run Statistical Tests in Excel

Using Excel for Statistics Tips and Warnings

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

The Dummy s Guide to Data Analysis Using SPSS

Introduction to Hypothesis Testing

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Data exploration with Microsoft Excel: analysing more than one variable

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Transcription:

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable (Y) and a binary random variable (X). This correlation is related to, but different from, the biserial correlation proposed by Karl Pearson. In psychology, the point biserial correlation is often used as a measure of the degree of association between a trait or attribute and a measureable characteristic such as an ability to accomplish something. Since it is a correlation, ρ ranges between plus and minus one. However, because of the discrete variable, the actual upper limit may be far less than one. When ρ is used as a descriptive statistic, no special distributional assumptions need to be made about the variables (Y and X). When hypothesis tests are made, it is assumed that the observation pairs are independent and that the values of Y are distributed normally conditional on the value of X. The distribution of Y when X =1 is normal with mean μ 1 and variance σ 2, while the distribution of Y when X = 0 is normal with mean μ 0 and variance also σ 2. If X is the result of a Bernoulli trial with probability of success (X = 1) p, then the design is said to be random. If X is set in advance, then the design is said to be fixed. Difference between Linear Regression and Correlation The point biserial correlation coefficient discussed in this chapter assumes that both X and Y are random variables. In the linear regression context, no statement is made about the distribution of X. In fact, X is not even a random variable. Instead, the values of X are set as part of the design. For example, a design might call for 20 men and 20 women to be included. Even though the same formula is used in this case, the results follow a different distribution with different sample size requirements. The analysis would then be termed linear regression and that procedure should be used to determine sample size and power. Technical Details The following results are found in Lev(1949) and Tate (1954). A random sample of n subjects is measured for the presence or absence of the trait (X) and the level of an ability (Y). This gives rise to n pairs of observations: (X i, Y i ), i = 1, 2,, n. Sample Point Biserial Correlation Coefficient The point biserial correlation coefficient, r, is calculated using the common product-moment correlation 807-1

(X X )(Y Y ) r = (X X ) 2 (Y Y ) 2 (Y 1 Y 0 ) n 1n 0 n = n i=1(y i Y ) 2 Random Design If it is assumed that 1. The binomial variable X takes on the value 1 with probability p and 0 with probability q = 1 p. 2. The condition distribution of Y given X = 1 is N(μ 1, σ) and the condition distribution of Y given X = 0 is N(μ 0, σ). The Tate (1954) provides results for the test statistic t calculated as t = r n 2 1 r 2 When ρ is 0, t follows Student s t distribution with n 2 degrees of freedom. When ρ is not 0, the distribution of t is a weighted sum of non-central t distributions each with degrees of freedom n 2 and noncentrality parameter δ R given by δ R = The weights are based on the binomial distribution of X. ρ 1 ρ n 1n 0 2 npq Thus, the power of an upper, one-sided test of H 0 : ρ = ρ 0 vs H 1 : ρ > ρ 0 computed at ρ = ρ 1 is n φ(p, ρ 1 ) = n n 1 n 1 =0 p n 1q n 0 h(t; n 1, n, p, ρ 1 )dt t α where h( ) is the density of the non-central t distribution with n 2 degrees of freedom and non-centrality δ R, and t α is chosen so that φ(p, ρ 0 ) = α. The sample size can be solved from the power function using a binary search algorithm. Procedure Options This section describes the options that are specific to this procedure. These are located on the Design tab. For more information about the options of other tabs, go to the Procedure Window chapter. Design Tab The Design tab contains most of the parameters and options that you will be concerned with. Solve For Solve For This option specifies the parameter to be calculated from the values of the other parameters. Under most conditions, you would either select Power or Sample Size. 807-2

Select Sample Size when you want to determine the sample size needed to achieve a given power and alpha error level. Select Power when you want to calculate the power. Dichotomous Variable Type Assume X s are This option specifies whether the X s are Random or Fixed. Select Random when X is the realization of a chance event. Select Fixed when X is set for each row by the experimenter (no chance event). Test Direction Alternative Hypothesis This option specifies the alternative hypothesis. This implicitly specifies the direction of the hypothesis test. The null hypothesis is H 0 : ρ0 = ρ1. Note that the alternative hypothesis enters into power calculations by specifying the rejection region of the hypothesis test. Its accuracy is critical. Possible selections for H 1 are: ρ1 ρ0 This is the most common selection. It yields the two-tailed test. Use this option when you are testing whether the correlation values are different, but you do not want to specify beforehand which correlation is larger. ρ1< ρ0 This option yields a one-tailed test. ρ1> ρ0 This option also yields a one-tailed test. Power and Alpha Power This option specifies one or more values for power. Power is the probability of rejecting a false null hypothesis, and is equal to one minus Beta. Beta is the probability of a type-ii error, which occurs when a false null hypothesis is not rejected. In this procedure, a type-ii error occurs when you fail to reject the null hypothesis of equal correlations when in fact they are different. Values must be between zero and one. Historically, the value of 0.80 (Beta = 0.20) was used for power. Now, 0.90 (Beta = 0.10) is also commonly used. A single value may be entered here or a range of values such as 0.8 to 0.95 by 0.05 may be entered. Alpha This option specifies one or more values for the probability of a type-i error. A type-i error occurs when you reject the null hypothesis of equal correlations when in fact they are equal. Values of alpha must be between zero and one. Historically, the value of 0.05 has been used for alpha. This means that about one test in twenty will falsely reject the null hypothesis. You should pick a value for alpha that represents the risk of a type-i error you are willing to take in your experimental situation. 807-3

You may enter a range of values such as 0.01 0.05 0.10 or 0.01 to 0.10 by 0.01. Sample Size N (Sample Size) This option specifies the number of observations in the sample. Each observation is made up of two values: an X value of 0 or 1 and a continuous Y value. The minimum value is 4. Point Biserial Correlations ρ0 (Correlation H0) Specify the value of ρ0. Note that the range of the correlation is between plus and minus one. This value is usually set to zero. ρ1 (Correlation H1) Specify the value of ρ1, the population correlation under the alternative hypothesis. Note that the range of the correlation is between plus and minus one. The difference between ρ0 and ρ1 is being tested by this significance test. You can enter a range of values separated by blanks or commas. Dichotomous Variable (X) P (Probability X = 1) Specify the value of p, the probability that X = 1 when you have a random design. Since this is a probability, it must be between 0 and 1. 807-4

Example 1 Finding the Power Suppose a study will be run to test whether the point biserial correlation between a random binary variable (X) and continuous variable (Y) is significantly different from zero. The researchers want to investigate what the power will be for a variety of sample sizes (5, 10, 20, 40, 80, 140, 200, 250) when alpha is 0.50. They want to calculate the power when ρ1 is actually 0.2, 0.4, and 0.6. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the procedure window by expanding Correlation, then Correlation, then clicking on Test (Inequality), and then clicking on. You may then make the appropriate entries as listed below, or open Example 1 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Power Assume X s are... Random Alternative Hypothesis... ρ0 ρ1 Alpha... 0.05 N (Sample Size)... 5 10 20 40 80 140 200 250 ρ0 (Correlation H0)... 0.0 ρ1 (Correlation H1)... 0.2 0.4 0.6 Plots Tab 2D Plots X-Y Plots... Click the Plot Setup button (Scatter Plot Format window appears) Y Axis Tab... Click on this tab. Y Axis Vertical appears Axis: Min:... Set to 0 Axis: Max:... Set to 1 OK button... Click to save the settings and close this window Plots Tab 3D Plots X-Y-Z Plots... Click the Plot Setup button (3D Surface Plot Format window appears) 3D Surfact Plot Tab... Click on this tab. 3D Surface Plot window appears Point Symbols... Uncheck this option Y Axis Tab... Click on this tab. Y Axis Vertical appears Axis: Min:... Set to 0 Axis: Max:... Set to 1 OK button... Click to save the settings and close this window Annotated Output Click the Calculate button to perform the calculations and generate the following output. 807-5

Numeric Results Numeric Results Assuming the Dichotomous Variable is Random and H1: ρ0 ρ1 Point Point Sample Biserial Biserial Prob Size Corr H0 Corr H1 X=1 Power N ρ0 ρ1 Alpha P 0.0602 5 0.0000 0.2000 0.0500 0.5000 0.0843 10 0.0000 0.2000 0.0500 0.5000 0.1345 20 0.0000 0.2000 0.0500 0.5000 0.2373 40 0.0000 0.2000 0.0500 0.5000 0.4334 80 0.0000 0.2000 0.0500 0.5000 0.6663 140 0.0000 0.2000 0.0500 0.5000 0.8174 200 0.0000 0.2000 0.0500 0.5000 0.8941 250 0.0000 0.2000 0.0500 0.5000............ Report Definitions Power is the probability of rejecting a false null hypothesis. N is the size of the sample drawn from the population. ρ0 is the value of the point biserial correlation under the null hypothesis (H0). ρ1 is the value of the point biserial correlation under the alternative hypothesis (H1). Alpha is the probability of rejecting a true null hypothesis. P is the probability that the dichotomous X = 1. Summary Statements A sample size of 5 achieves 6% power to detect the difference between the null hypothesis point biserial correlation of 0.0000 and the alternative hypothesis point biserial correlation of 0.2000 using a two-sided hypothesis test with a significance level of 0.0500. The probability that the dichotomous variable will be equal to 1 is assumed to be 0.5000. This report shows the values of each of the parameters, one scenario per row. The values from this table are plotted in the charts below. Plots Section These charts show both a two-dimensional and a three-dimensional depiction of the relationship between power, sample size, and ρ1. 807-6

Example 2 Validation using Tate Tate (1955) page 1083 presents an example in which the power of a point biserial correlation coefficient is calculated. This example sets N = 10, alpha = 0.10, p = 1/3, ρ0 = 0, and ρ1 = 0.707. Tate calculates a power of 83.2% for a two-sided test. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the procedure window by expanding Correlation, then Correlation, then clicking on Test (Inequality), and then clicking on. You may then make the appropriate entries as listed below, or open Example 1 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Power Assume X s are... Random Alternative Hypothesis... ρ0 ρ1 Alpha... 0.1 N (Sample Size)... 10 ρ0 (Correlation H0)... 0.0 ρ1 (Correlation H1)... 0.707 Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results Assuming the Dichotomous Variable is Random and H1: ρ0 ρ1 Point Point Sample Biserial Biserial Prob Size Corr H0 Corr H1 X=1 Power N ρ0 ρ1 Alpha P 0.8351 10 0.0000 0.7070 0.1000 0.3333 The power of 0.8351 matches Tate s results to two decimal places. This is very good considering Tate explains in the article that they were using an approximation for the non-central t distribution. 807-7