Equating Student Scores Across Test Administration Modes: Grade 11 Mathematics MCA-III

Similar documents
Annual Report of Life Insurance Examinations Calendar Year 2010

Office of Institutional Research & Planning

TECHNICAL REPORT #33:

High School Graduation Rates in Maryland Technical Appendix

Student Placement in Mathematics Courses by Demographic Group Background

The 3-Level HLM Model

High school student engagement among IB and non IB students in the United States: A comparison study

Relating the ACT Indicator Understanding Complex Texts to College Course Grades

Table 5.1 Degrees Conferred by School/College and Campus Fall Spring 2012

2013 New Jersey Alternate Proficiency Assessment. Executive Summary

Youngstown State University Associate's Degrees Awarded by Term and Ethnicity

Youngstown State University Bachelor's Degrees Awarded by Term and Ethnicity

Carl Perkins IV State Report

2009 CREDO Center for Research on Education Outcomes (CREDO) Stanford University Stanford, CA June 2009

Multivariate Models of Student Success

Accelerated English Classes Report. Fall 2010-Spring 2011

MEDICAL ASSISTANT APPLICATION

ILLINOIS SCHOOL REPORT CARD

Carl Perkins IV State Report

New Jersey Center for Teaching and Learning (CTL) 2015 Alumni Survey Results

FACT BOOK DEGREES CONFERRED. Table 5.1 Degrees Conferred by Schools/ Colleges and Campus Fall Spring 2015

Total Males Females (0.4) (1.6) Didn't believe entitled or eligible 13.0 (0.3) Did not know how to apply for benefits 3.4 (0.

Minnesota Tablet Usability Study Report

Allen Elementary School

Transitioning English Language Learners in Massachusetts: An Exploratory Data Review. March 2012

District: LITTLESTOWN AREA SD AUN: Test Date: PSSA Spring English Language Arts. Mathematics Science

Campus Data Packet OTTO M. FRIDIA, JR. ALTERNATIVE SCHOOL. for Plans. School Number 27

Table 5.1 Degrees Conferred by School/College and Campus Fall Spring 2013

For Immediate Release: Thursday, July 19, 2012 Contact: Jim Polites

San Diego Unified School District California

Participation and pass rates for college preparatory transition courses in Kentucky

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

The Outcomes For CTE Students in Wisconsin

YEAR 3 REPORT: EVOLUTION OF PERFORMANCE MANAGEMENT ALBANY NY CHARTER SCHOO CHARTER SCHOOL PERFORMANCE IN FLORIDA. credo.stanford.edu.

YEAR 3 REPORT: EVOLUTION OF PERFORMANCE MANAGEMENT ALBANY NY CHARTER SCHOO CHARTER SCHOOL PERFORMANCE IN NEW YORK CITY. credo.stanford.

ILLINOIS SCHOOL REPORT CARD

YEAR 3 REPORT: EVOLUTION OF PERFORMANCE MANAGEMENT ALBANY NY CHARTER SCHOO CHARTER SCHOOL PERFORMANCE IN ARIZONA. credo.stanford.edu.

College of Medicine Enrollment MD and MD/MPH Fall 2002 to Fall 2006

Student Profile -Statistics on enrollment at University of Florida

A Study of the Efficacy of Apex Learning Digital Curriculum Sarasota County Schools

Estimated Population Responding on Item 25,196,036 2,288,572 3,030,297 5,415,134 4,945,979 5,256,419 4,116,133 Medicare 39.3 (0.2)

THE CHALLENGES OF BUSINESS OWNERSHIP: A COMPARISON OF MINORITY AND NON-MINORITY WOMEN BUSINESS OWNERS

Total Enrollment Fall 2007 to Fall 2011

New York State Profile

Assessment of the Associate of Arts in General Studies Degree

Technical Report. Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: to

GRADUATE APPLICATION PROCEDURE

RESEARCH BRIEF. academic experiences and perceptions,

SURVEY RESEARCH AND RESPONSE BIAS

TEXAS ENGINEERING FOUNDATION Scholarship Application for Graduating Texas High School Seniors

Dr. Frank Till Superintendent of Schools HEALTH EDUCATION IN MIDDLE SCHOOLS: ONE COURSE VERSUS INTEGRATION EVALUATION REPORT

RESULTS FROM HIGH SCHOOL EXIT SURVEYS 5/6/2015 SYSTEM PLANNING AND PERFORMANCE PORTLAND PUBLIC SCHOOLS HIGHLIGHTS

Running Head: FIELD TESTING

Data Analysis, Statistics, and Probability

Name: Office of Graduate Admission Loyola University Maryland 2034 Greenspring Drive Timonium, MD 21093

Social work education in England

A Study of Efficacy of Apex Learning Cherokee County School District

District % of Students Met Standard (Proficient) and Commended (Advanced) State % of Students Met Standard (Proficient) and Commended (Advanced)

Instructional Programs

Bangor Central Elementary School Annual Education Report

Policy Capture for Setting End-of-Course and Kentucky Performance Rating for Educational Progress (K-PREP) Cut Scores

The Math TLC Master s in Mathematics for Secondary Teachers Program

TEXAS SCHOOLS PROFILE RE-IMAGINING TEACHING AND LEARNING

CHARTER SCHOOL PERFORMANCE IN INDIANA. credo.stanford.edu

May Minnesota Undergraduate Demographics: Characteristics of Post- Secondary Students

University of California Office of the President Doctoral Persistence and Completion Rates Fall Entry Cohorts. Purpose

Determining Minimum Sample Sizes for Estimating Prediction Equations for College Freshman Grade Average

Robert Lipkins, PhD MFT Program Director Professional Examination Service. San Francisco, California September 6, Credentialing Insight

Undergraduate and Graduate Student Diversity

Texas High School Graduates College Enrollment Trends

MCA Data Discoveries

South Dakota DOE Report Card

The Effects of Read Naturally on Grade 3 Reading: A Study in the Minneapolis Public Schools

What are the Effects of Comprehensive Developmental Guidance Programs on Early Elementary Students Academic Achievement?

Iowa School District Profiles. Central City

LOUISIANA SCHOOLS PROFILE RE-IMAGINING TEACHING AND LEARNING

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

CHARTER SCHOOL PERFORMANCE IN PENNSYLVANIA. credo.stanford.edu

BASI Manual Addendum: Growth Scale Value (GSV) Scales and College Report

Oklahoma City Public Schools Oklahoma

Administrative Council July 28, 2010 Presented by Nancy McNerney Institutional Effectiveness Planning and Research

DATA COLLECTION AND ANALYSIS

ILLINOIS DISTRICT REPORT CARD

College Board Announces Scores for New SAT with Writing Section

ILLINOIS SCHOOL REPORT CARD


WORLD S BEST WORKFORCE PLAN

School of Nursing Fact Book IV

Move On When Ready Application Checklist

Table 5.1 Degrees Conferred by School/College and Campus Fall Spring 2010

RELATIONSHIP BETWEEN THE PATTERN OF MATHEMATICS AND SCIENCE COURSES TAKEN AND TEST SCORES ON ITED FOR HIGH SCHOOL JUNIORS

Characteristics of Colorado s Online Students

State of New Jersey

Campus Data Packet OTTO M. FRIDIA, JR. ALTERNATIVE SCHOOL. for Plans. School Number 27

Total Group Profile Report

Attrition in Online and Campus Degree Programs

Descriptive Statistics

South Dakota DOE Report Card

ILLINOIS DISTRICT REPORT CARD

Broward County Public Schools Florida

Transcription:

Equating Student Scores Across Test Administration Modes: Grade 11 Mathematics MCA-III Introduction When assessments are delivered using multiple modes, such as online and paper-and-pencil, it is necessary to determine whether test scores across modes are comparable, and if not, to identify the linking constants necessary to place test scores from different modes onto a common scale. The first operational administration of the Grade 11 Mathematics MCA-III in spring 2014 was delivered to schools in both online and paper modes. Prior to reporting test scores for the spring 2014 mathematics assessments, a mode comparability study was performed to evaluate differences in test performance attributable to the mode of test administration, and to identify the linking constants necessary to place item parameter estimates across modes on a common scale for test scoring. This document summarizes the mode comparability analyses conducted during the summer of 2014. Background Following the first operational administration of the Grade 11 Mathematics MCA-III in spring 2014, AIR and the Minnesota Department of Education (MDE) conducted a mode comparability study to evaluate mode-based differences in student performance, and estimate the linking constants necessary to place paper- and computer-based item parameters and resulting ability estimates on an equivalent scale. The first operational online administration of the Grade 11 Mathematics MCA-III comprised six unique fixed forms to which students were assigned at random. MDE also produced one form for paper-based test administrations. The paper form was constructed to be as similar as possible to one of the online forms, and the similarity between those forms provided the basis for a study to evaluate the comparability of items administered in online and paper test modes. The matched online and paper forms shared 54 common items, but differed in the following ways: (a) each form contained two operational test items that were not included on the other form, and (b) the forms contained different sets of items in the 16-item embedded field test blocks. The 54 items common to both forms were used for the mode comparability study. The online matched form was administered to twice as many students as the other online forms, to ensure a sufficiently large online student sample to conduct a mode comparability study. Table 1 shows the number of items used for the mode comparability study as well as the sample size counts for students taking the common form online and on paper. Note that these samples include only public school students with valid test records and for whom there was no missing data in the matching variables used. Student records that did not meet the attemptedness criterion or were invalidated were excluded from all analyses. 1

Table 1: Number of Common Items and Number of Students Administered the Common Form Methodology Number of Common Items Common Form Sample Size Online Matched Form 2 Common Form Sample Size Paper Form 54 5,692 35,123 Similar to the approach used previously by MDE, the mode comparability study for Grade 11 Mathematics MCA-III was accomplished using a matched samples design (Way, Davis, and Fitzpatrick, 2006). Following this design, a reference sample was identified from one mode and a matched sample of students responding in the alternate mode was also drawn. The samples were matched on student achievement and demographic information including gender, ethnicity, free and reduced lunch eligibility, and special education designation. The two samples were then equated using a common item linking design. Although about 35% of the student population participated in the online administration, only about 5,700 students were administered the common form online. Thus, the online sample for the mode comparability study comprised all students administered the common form online for whom a complete set of matching variable data was available. Following the procedures described below, a sample was then drawn from the paper test administration that matched the characteristics of students in the smaller online sample. The following procedures were used to define the matched samples between the online and paper test administration modes. 1. For students participating in the paper test administration, 2014 Mathematics MCA-III raw scores were regressed on relevant achievement and demographic variables, including 2013 Reading MCA-III scores, ethnicity, gender, eligibility for free or reduced-price lunch, and special education designation following the regression equation below: YY = ββ 0 + ββ 1 XX 1 + ββ 2 XX 2 + ββ 3 XX 3 + + ββ 8 XX 8 Where YY is the predicted 2014 MCA-III Mathematics raw score, ββ nn refers to the estimated regression weight for covariate XX nn, and XX 1 = 2013 Grade 10 MCA-III Reading theta score XX 2 = American Indian-White Contrast XX 3 = Asian-White Contrast XX 4 = Hispanic-White Contrast XX 5 = African American-White Contrast XX 6 = Female-Male Contrast XX 7 = Free or Reduced-Price Lunch Eligibility Contrast XX 8 = Special Education Contrast

2. Using the obtained regression weights, the prediction equation was applied to all students taking the common forms across both test administration modes, generating a predicted 2014 Grade 11 Mathematics MCA-III paper form raw test score for each student. Using the predicted 2014 Mathematics test scores, the online sample was then divided into 20 equal sized ability groups. The predicted test score distribution cut points identified for the online sample were then applied to the paper mode sample to form 20 ability subgroups for the paper sample as well. The size of the paper mode comparability sample was limited only by the number of students available in the smallest of the paper mode subgroups. Therefore, from the 20 paper mode subgroups, the smallest subgroup was identified, and used to determine the target sample size for each of the 20 paper mode subgroup samples. For each of the 20 ability subgroups in the paper mode sample, a random sample of students was drawn (without replacement) equal to the target sample size. Thus, the paper based sample was the largest matched sample that could be drawn without replacement such that the proportion of students at each predicted score level was equivalent to that of the online sample. The larger paper matched sample was used in order to obtain the most stable item parameter estimates possible. This approach differed from that used in previous studies by MDE where bootstrapping estimates of sampling error were generated using equivalent paper and online sample sizes in order to determine whether mode differences were significant. For the current analysis, the existence of mode differences was assumed and the goal was to obtain the most precise estimate of them. 3

Regression Equations The regression equation had an R-squared value of 0.53, indicating that substantial variation in the 2014 test scores could be accounted for by the achievement and demographic variables included in the model. Table 2 shows the raw and standardized regression coefficients for each variable entered into the model. In particular, it is noted that even in the absence of concurrent ability measures or a prior year measure of the same construct, the 2013 reading achievement scores were predictive of 2014 math achievement. Table 2: Raw and Standardized Regression Coefficients in the Regression of G11 Math Raw Scores on Relevant Achievement and Demographic Variables Variable Regression Standardized Coefficients Regression Coefficients Intercept 33.6159 0.0000 2013 MCA-III Reading 6.3504 0.6167 Am. Indian - White -2.0466-0.0205 Asian - White 1.6696 0.0425 Hispanic - White -1.9671-0.0420 Black - White -2.4072-0.0637 Female - Male -1.4968-0.0698 FRL - non-frl -2.6277-0.1106 SpecEd - non-speced -4.4087-0.1107 4

Comparing the Matched Samples Table 3 provides a comparison of the demographic and achievement characteristics between the online and paper samples drawn for the mode comparability study. The table presents demographic and achievement characteristics for three groups, including the sample of online students taking the common form, the full paper sample, and the sample of paper administrations matched to the online sample. For each sample, the table presents the proportion of students classified in each demographic category as well as the average raw score on the spring 2014 mathematics assessment. Note that the raw score summary depends on the 54 operational items that were common across the paper and online version of the paper form. Results indicate that the demographic composition and prior Reading achievement of the matched samples is quite similar and that the matching procedure was effective. Comparison of the observed raw score means on the 2014 Mathematics MCA-III for the matched samples supports the a priori assumption of a mode effect that requires adjustment. Table 3: Demographic Comparisons of Matched Samples Demographics Full Paper Matched Matched Sample Samples Paper Samples Online Sample Size 35,123 28,020 5,692 Male 0.51 0.50 0.50 Female 0.49 0.50 0.50 White 0.76 0.75 0.80 Black 0.09 0.10 0.06 Hispanic 0.06 0.06 0.07 Asian/Pacific Islander 0.08 0.08 0.05 American Indian 0.01 0.01 0.02 Free Lunch Yes 0.28 0.31 0.34 Special Education Yes 0.08 0.09 0.09 Mean (SD) of 2013 Reading Theta Score 0.1054 (1.04) 0.0020 (1.02) 0.0136 (1.03) Mean (SD) of 2014 Math Raw Score 31.0 (10.3) 30.3 (10.3) 27.9 (9.9) 5

Results With the matched online and paper mode samples in hand, IRT parameter estimates were then obtained independently for each sample. The Stocking-Lord procedure for equating was then used to place the item parameters obtained from the online sample onto the scale represented by the item parameters from the paper sample. Applying the mode linking constants to the online item parameter estimates produced a mode-corrected online target scale for placing the paper item parameters onto the mode-corrected online scale. ICC differences between the matched sample paper and mode-corrected online item parameters were then examined to evaluate whether the performance of some items was so different across modes that they should be dropped from the linking set. Results of this review indicated that all available linking items should be included in the final linking set. Table 4 shows the final Stocking-Lord slope and intercept coefficient estimates that reflect the impact of mode on student performance across the 54 common test items. These constants were applied to the matched online sample parameter estimates to produce the target, mode-corrected online scale. Table 4: Stocking-Lord Slope and Intercept from Matched Samples Equating Slope Intercept # Items 0.966532-0.249869 54 With the matched sample, mode-corrected online item parameters as a target scale, the next step was to identify the Stocking-Lord constants for each step in the chain to link the item parameters from the paper bank to those in the online bank. The chain procedure was chosen to account for all possible item parameter differences between the two administration modes, as well as between the samples used to estimate them. Table 5 shows the S-L constants for each step in the chain, including linking the paper bank parameters to the item parameters based on the matched paper sample, linking the matched paper sample to the mode-corrected matched online sample, and linking the mode-corrected online sample to the mode-corrected online bank scale. Linking constants between the full paper and matched paper samples, as well as between the matched online and full paper samples, reflect differences in the population of schools and students participating across the paper and online test administration modes. The linking constants between the matched samples indicate that the matching procedure was effective. Table 5: Stocking-Lord Slope and Intercept to Equating Calibration and Matched Samples Link Slope Intercept Items Full Paper to Sample Paper 1.031134 0.026874 54 Sample Paper to Mode Corrected Online Sample 0.999813-0.000008 54 Online Sample to Full Online 0.976204 0.051554 54 6

Table 6 shows the aggregate Stocking-Lord constants used to put the full-sample paper calibration item parameters onto the mode corrected scale, and thus reflects differences between the paper and online samples that are not attributable to test administration mode based on the matched sample procedure. Table 6: Aggregate Stocking-Lord Slope and Intercept Link Slope Intercept Items Full Paper to Mode Adjusted Scale 1.006409 0.077776 54 With these values in hand, the linkage between the paper and online bank parameters was applied, placing the paper bank parameters on a common, mode-adjusted scale with the online bank. Conclusion The mode comparability study described in this document examined the comparability of scores from the online and paper administrations of the spring 2014 MCA-III Mathematics assessments. A matched samples analysis indicated the existence of mode effects in the obtained item parameter estimates. The identified mode linking constant was applied to produce mode-adjusted parameter estimates to place the paper item parameters on the Grade 11 online scale. Final student scores will be computed using the equivalent scale item parameters appropriate to the mode in which the test was administered. Adjusting for mode effects and placing the paper and online parameters on common scale allows for direct comparison of scores across the modes. 7

References Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210. Way, W. D., Davis, L. L., & Fitzpatrick, S. (2006). Score comparability of online and paper administrations of the Texas Assessment of Knowledge and Skills. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA. 8