ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA



Similar documents
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Is it statistically significant? The chi-square test

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Survey Data Analysis in Stata

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

American Association for Laboratory Accreditation

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

New SAS Procedures for Analysis of Sample Survey Data

Quantitative Methods for Finance

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Recall this chart that showed how most of our course would be organized:

Sampling Error Estimation in Design-Based Analysis of the PSID Data

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

10. Analysis of Longitudinal Studies Repeat-measures analysis

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

11. Analysis of Case-control Studies Logistic Regression

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

From the help desk: Bootstrapped standard errors

Simple Second Order Chi-Square Correction

Visualization of Complex Survey Data: Regression Diagnostics

1.5 Oneway Analysis of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

THE KRUSKAL WALLLIS TEST

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Software for Analysis of YRBS Data

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Chapter 3 RANDOM VARIATE GENERATION

Chapter XXI Sampling error estimation for survey data* Donna Brogan Emory University Atlanta, Georgia United States of America.

Confidence Intervals for the Difference Between Two Means

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Need for Sampling. Very large populations Destructive testing Continuous production process

Chapter 7: Simple linear regression Learning Objectives

Standard Deviation Estimator

AP STATISTICS (Warm-Up Exercises)

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Survey Data Analysis in Stata

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

One-Way Analysis of Variance

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Simple Linear Regression Inference

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Quantitative Methods in Enterprises Behavior Analysis under Risk an Uncertainty. Kayvan Miri LAVASSANI 2. Bahar MOVAHEDI 3.

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Survey Analysis: Options for Missing Data

Chapter 7. One-way ANOVA

General and statistical principles for certification of RM ISO Guide 35 and Guide 34

Research Methods & Experimental Design

Comparison of Estimation Methods for Complex Survey Data Analysis

Descriptive Statistics

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Chapter 5 Analysis of variance SPSS Analysis of variance

Calculating the Probability of Returning a Loan with Binary Probability Models

Life Table Analysis using Weighted Survey Data

Multivariate Statistical Inference and Applications

Teaching Statistics with Fathom

Simulating Chi-Square Test Using Excel

An analysis method for a quantitative outcome and two categorical explanatory variables.

Fairfield Public Schools

The Variability of P-Values. Summary

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

How To Calculate A Multiiperiod Probability Of Default

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Data Mining Techniques Chapter 6: Decision Trees

Chapter 23. Two Categorical Variables: The Chi-Square Test

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap

No. 92. R. P. Chakrabarty Bureau of the Census

Online Appendices to the Corporate Propensity to Save

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

A Basic Guide to Analyzing Individual Scores Data with SPSS

UNIVERSITY OF NAIROBI

James E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana.

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Additional sources Compilation of sources:

January 26, 2009 The Faculty Center for Teaching and Learning

Multiple Linear Regression

Transcription:

m ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA Jiahe Qian, Educational Testing Service Rosedale Road, MS 02-T, Princeton, NJ 08541 Key Words" Complex sampling, NAEP data, Jackknife, Satterthwaite's formula, Shrinkage estimate In the analysis of complex survey data, study trends and comparisons usually involve t-type tests and confidence intervals. The National Assessment of Educational Progress (NAEP), which uses a multistage stratified probability sample design, provides examples of these studies. NAEP is designed to assess and report on student academic achievement and educational trends in the United States. In t-type tests, one basic issue involved is the calculation of the variances of the statistics of interest. Several approaches are often employed in practice such as, interpenetrating subsamples, Jackknife repeated replication (JRR), balanced repeated replication (BRR), bootstrap, and the Taylor series method. In NAEP, the paired JRR is used to estimate variances. Another practical issue in t-type tests is the estimation of the effective degrees of freedom for the variances. Satterthwaite's formula can be used to obtain the effective number of degrees of freedom for the variances estimated by the approaches (Rao & Scott, 1981; Johnson & Rust, 1992) discussed in Section 1.1. In application, however, some deficiencies in the Satterthwaite's formula would occur. First, it would underestimate the effective degrees of freedom; see Section 1.2. Johnson and Rust (1992) use an adjusted formula that is determined by empirical approach. Second, the measure of the effective degrees of freedom could be unstable; see the discussion in Section 1.3. Examples were found in the statistical tests conducted for the 1996 NAEP long-term trend assessments. To verify the cause of downward bias and instability, a computer simulation was done; see Section 1.4. The objective of this research is to find approaches that produce stable estimates of the effective degrees of freedom, which are discussed in Sections 2.1-2.2. This research was partially supported under the National Assessment of Educational Progress (Grant No. R999G50001) as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The author thanks Eugene Johnson and Jim Carlson for many helpful comments. 1. THE ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM 1.1 The Satterthwaite's Formula In analysis of independent samples, the number of degrees of freedom for the variance of a linear combination of several estimates is not simply the sum of the numbers of degree of freedom for each estimate. One estimate of the effective number of degrees of freedom is obtained by matching estimates of the first two moments of the variance to those of a chi-square random variable (Satterthwaite, 1941, 1946; Cochran, 1977). The degrees of freedom are the measure of the stability of a variance estimator. For a linear combination of independent normally m distributed x = ~ xj estimates, the effective degree of j=l freedom of the variance of x is C[fs: ~ Sx2j / (S~](n,-1)) j=l nj 2 _~j)2 wherenj (Satterthwaite 1941), Sxj= 1/(nj - 1)~ (xji, i=1 is the sample size of jth subsample. The effectivedf s always between the smallest of the nj - 1 and their sum. : / j=1 32 ) /j~x (w~s~l(n'-l))''= For a weighted combination of ~.j, A special case is where the values of nj are 2. Then df s = Sx2j [ S 4 Obviously, 1 < df < m So a j=l xj.... smaller df s would cause hypothesis tests and confidence intervals more conservative rather than a traditional one. The Satterthwaite's formula can be used to calculate the effective number of degrees of freedom for variances estimated by interpenetrating subsamples, Jackknife and BRR approaches. In the paired Jackknife procedures, two Primary Statistical Units (PSU) are selected from each stratum. In NAEP studies, PSUs usually form 62 pairs. With a paired selection design, one PSU is dropped from stratum 1 at random; then the weights of elements in the other PSU in that stratum are doubled. Based on this set 704

of weights, statistic 81, say sample mean, can be calculated to estimate population parameter 0. 8 x is called a pseudo-value. Repeat the process by dropping one PSU from each of the strata in turn, doubling the weights of elements in the other PSU, and computing 82, 8 3,... 8 m. Then V(0) is estimated by v(0) - Ill (8 i _ ~)2. This procedure can readily be extended to i=1 designs with more than two sampled PSUs per stratum. In NAEP samples, the Jackknife variance has the sum of 62 squared variations, which are equivalent to 62 pairs of independent samples in the Satterthwaite's formula. Therefore, the Satterthwaite's formula can be applied to estimate the degrees of freedom for Jackknife variances. Although the Satterthwaite's formula is broadly used in complex data analysis, Satterthwaite estimates have several deficiencies. 1.2 The Adjustment of Downward Bias in Estimation of the Effective Degrees of Freedom One deficiency in Satterthwaite estimates is the underestimation of the effective degrees of freedom (Johnson & Rust, 1992). It will introduce downward bias in the estimation and cause tests to be too conservative. In NAEP data, the estimated effective degrees of freedom for the NAEP Jackknife variance are sometimes noticeably smaller than the degrees of freedom attributed to the corresponding error estimates from conventional techniques that assume a simple random sampling of students. When comparing the unadjusted df s from the data in Table 2.1 with the average df s in simulation in Table 1.1, the downward bias is clear. The simulation results show that the downward bias could be caused by a violation of the normality assumption. We found that the means and medians of df s when PSU means form a gamma distribution are smaller than those with PSU means normally distributed. To remedy the downward bias, Johnson and Rust (1992) proposed an adjustment to df s j--1 1.3 The Instability of the Estimates of the Effective Degrees of Freedom Another drawback is the instability in estimation by the Satterthwaite's formula. For example, the instability was found in the report on the 1996 NAEP 8th Grade,,2 Mathematics Long-Term Trend Across Assessments. Table 1.2 are the estimated effective degree of freedom for weighted proficiency scores, which are major measurements in NAEP studies. In Table 1.2 shows that the effective degree of freedom was 14.3 for male students in the 1992 NAEP mathematics long-term trend across assessments, and 43.8 for female students. The largest effective degree of freedom for male students is 34.1 in 1990, which is more than twice that of 14.3 in 1992. The coefficients of the variation are and for male and female students, respectively. The larger coefficients of variation show instability. In general, the coefficients of variation are greater than or equal to 0.2. Similar problems were found in other NAEP data. The problems of downward bias and instability may cause errors in statistical results and misleading in decisions, which are unacceptable. 1.4 The Empirical Distribution of the Effective Degrees of Freedom The problems of underestimation and instability could be caused by the following: First, in empirical survey data, the assumption of normality for the Satterthwaite formula could be violated. Second, the estimate for the effective degrees of freedom is a ratio of two high order moment estimates, so the variance of Satterthwaite's estimates would be very large. Third, the sampling methods within each PSU are usually complex, and therefore, could introduce weighing and design effects. To verify the causes of the problems, we used a Monte Carlo simulation to approximate the distribution of the estimates of the effective degrees of freedom. Although normality is the assumption for the Satterthwaite's formula, the normal, gamma, and uniform distributions of random variables were employed in the simulation. The simulation was repeated 2000 times. In a typical simulation, the number of random variables, group size, is set at 62, which is the same as the number of Jackknifing replicates in NAEP. The empirical distributions (see Figures 1-3) show that the effective degrees of freedom were distributed close to a normal one. Table 1.1 lists the means of the degrees of freedom and their standard errors for a typical simulation. When comparing them with the Satterthwaite' s estimates from NAEP data, in the first column in Table 2.1, a downward bias can be easily found. Table 1.1 also shows that the coefficients of variation from the simulation are relative small; however, those derived from NAEP mathematics long-term trend across assessments in Table 1.2 are large. These show the evidence of instability of Satterthwaite's estimates. The Figure 4 shows the linear relationship between degrees 705

of freedom and group sizes. 2. THE IMPROVEMENT IN INSTABILITY OF SATI'ERTHWAITE ESTIMATES Several improvements are proposed here to solve the problem of instability. 2.1 A Moderate Number of Degree of Freedom To remedy the instability, the mean or median estimate of the effective degrees of freedom, df, can be used as the effective degrees of freedom in hypothesis tests. We can obtain df by a computer simulation. 2.2 Composite Estimator To improve the accuracy of estimates for the effective degrees of freedom, a general composite estimator can be expressed as dfj = gdf~t + (1 - t~)df A, where df~, is the mean of the simulation distribution and df A is the improved estimate of the effective degrees of freedom (Johnson & Rust, 1991). 2.3 Optimal Shrinkage Estimator To treat df n as "model-based" estimator, an optimal composite estimator can be defined: t Am](Au+Am) if A m>_0 and A u>_0, cs -~ 1 if A < 0, 0 otherwise, with Am~V(c[f A) + Bias2(dfA) and Au~ V(df~). This is a special case of the optimal composite estimator proposed by Cohen and Spencer (1991). In calculation, Otcs can be estimated by sample moments for Am and Au. And V(df A) would be estimated by the Jackknife approach. This shrinkage estimator was used to estimate the effective degrees of freedom for the mean scale scores for NAEP assessments. The results of the shrinkage estimates are listed in the second and third columns in Table 2.1. They are stable, and also avoid overadjustments, which sometimes are caused by the adjustments (Johnson & Rust, 1991). REFERENCES Cochran, W. G. (1977). Sampling Techniques, 3rd ed. John Wiley & Sons, New York. Cohen, T. & Spencer, B. (1991). "Shrinkage Weights for Unequal Probability Samples." Proceedings of the Section on Survey Research Methods, 625-630. Johnson, E. & Rust, K. (1992). "Effective Degrees of Freedom for Variance Estimates from a Complex Sample Survey," Proceedings of the Section on Survey Research Methods, American Statistical Association. Qian, J., & Spencer, B. (1993). "Optimally Weighted Means in Stratified Sampling," Proceedings of the Section on Survey Research Methods, American Statistical Association, 863-866. Rao, J.N.K., & Scott, A.J. (1981), "The Analysis of Categorical Data From Complex Sample Surveys: Chi -Squared Tests for Goodness of Fit and Independence in Two-Way Tables," Journal of the American Statistical Association, 76, 221-230. Satterthwaite, F. E. (1941). "Synthesis of Variance," Psychometrika, 16, 5, 309-316. Satterthwaite, F. E. (1946). "An Approximate Distribution of Estimates of Variance Components," Biometrics, 2, 11 O- 114. Table 1.1 The Means, Medians and Standard Errors for Three Distributions (In Simulation: N=2000; Group Size: m--62) Distribution Normal: N(0, 1) Gamma" G(2, 1) Uniform: U ( 0, 1) Mean Median STD 22.10 22.39 3.72 15.17 14.86 5.47 33.98 33.97 2.93 706

Table 1.2 The Estimated Effective DF for the Variance of Mean Proficiency Scores in NAEP Mathematics Long-Term Trend Across Assessments (Age 17) 1978 1982 1986 1990 1992 1994 1996 Sex Male 27.4 28.0 23.0 34.1 14.3 33.0 20.7 Female 25.9 22.1 19.3 39.5 43.8 18.6 12.1 Ethnicity White 30.0 19.0 17.8 24.5 28.6 25.2 23.3 Black 30.8 18.4 18.5 10.7 26.8 24.5 29.8 Hispanic 6.8 7.1 6.4 12.8 9.1 7.0 19.5 Other 17.0 3.3 6.2 1 21.7 6.3 5.0 Region Northeast 16.1 7.2 8.8 28.2 11.3 16.6 12.9 Southeast 8.0 6.0 14.2 9.0 14.6 12.4 15.5 Central 6.0 5.1 14.4 13.6 7.8 9.1 26.0 West 4.4 5.2 7.3 23.5 9.7 6.7 9.7 Taken Computer Pgm Have 7.8 26.2 11.0 28.9 28.6 21.5 21.2 Have not 31.4 21.6 26.7 33.8 39.8 32.9 38.2 CV 0.2 0.5 0.2 Table 2.1 The Composite Estimates of DF for the Variance for Mean Proficiency Scores in 1996 NAEP Eighth Grade Mathematics Assessments Not adjusted elf s Composite estimate for not adjusted df s Adjusted cff s Composite estimate for adjusted df s Sex Male 11.6 21.0 32.5 23.1 Female 15.6 2 43.9 22.7 Ethnicity White Black Hispanic Asian 9.9 6.5 6.1 7.8 21.1 21.2 27.9 18.3 17.1 22.0 23.5 2 2 22.0 Region Northeast 3.8 21.4 10.7 21.0 Southeast Central West 5.6 9.0 8.2 21.1 21.2 15.8 25.3 22.9 20.8 23.8 22.8 707

Figure 2. The Distribution of Degree of Freedom Proficiency Has A Gamma Dist. Figure 1. The Distribution of Degree of Freedom Proficiency is Normally Distributed 160... 250.~ t... : ;... 140!...,!'....!... 200 ~.................. o 100 :-... >" " c ~... _.................. ~ :,. @ i = 80 I... u",:,_,-, i,:,- i m II1,... i - i i i iii_i i I lililil!l I I I I I li.,,.,. ' I "~' ".... Ii.............,-o,oo'....... LI_... ~.......... l...,,,,,~m ~ 40 i...-~.. " _ 50 20 ~ 0 ;-" 0-2 7 12 17 22 27 32 10 15 20 25 I I. 1.. 3O Figure 4. Relationship between DF & Group Sizes Proficiency is Normally Distributed 0,T... ) 25.....-"" L,.,,--'-'-.~'-"-... I u... 20 -;.....--~:...,,,.m = *~ t,l=_, _ =a 0 I,r" 15 -~-..,..-...,,..m" l.-.. -...! 10 I'.,.B.m -~Y... Figure 3. The Distribution of Degree of Freedom Proficiency is Uniformly Distributed 300 T... j... / 250 -!... i >',200 1................. o I C -............... (D :u 150 -~... LL 100 ~... 5 j':'''' " 12 17 22 27 32 37 42 47 52 57 62 67 72 Group Sizes 24 29 :34 ;39 44