Comparing Apples with Oranges? Linking National and International Large-scale Assessments

Similar documents
TIMSS IEA s TIMSS 2003 International Report on Achievement in the Mathematics Cognitive Domains

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

TIMSS 2011 User Guide for the International Database

Journey Towards International Benchmarking: A Story about Private Schools in the United Arab Emirates (UAE)

Policy Capture for Setting End-of-Course and Kentucky Performance Rating for Educational Progress (K-PREP) Cut Scores

CPM High Schools California Standards Test (CST) Results for

State of New Jersey

New Jersey High School Performance Ranking - A Summary

State of New Jersey OVERVIEW WARREN COUNTY VOCATIONAL TECHNICAL SCHOOL WARREN 1500 ROUTE 57 WARREN COUNTY VOCATIONAL

Benchmark Assessment in Standards-Based Education:

Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between

MEASURE A REVIEW OF OUTCOMES OF SCHOOL EDUCATION IN AUSTRALIA. John Ainley Eveline Gebhardt. Australian Council for Educational Research

C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3

e-asttle Glossary The following terms are commonly used in e-asttle:

State of New Jersey

South Carolina. 180 The Proficiency Illusion

Accountability Brief

State of New Jersey

South Dakota DOE Report Card

State of New Jersey

Oak Ridge Keys to Success. Panel Input on College and Career Readiness for Oak Ridge students

NRS Test Benchmarks for Educational Functioning Levels

Top-to-Bottom Ranking, Priority, Focus and Reward Schools Identification Business Rules. Overview

Module 3: Correlation and Covariance

S=Specific, M=Measurable, A=Appropriate, R=Realistic & Rigorous, T=Timebound

Top-to-Bottom Ranking, Priority, Focus and Rewards Schools Identification Business Rules. Overview

NEPS Working Papers. NEPS Technical Report for Mathematics Scaling Results of Starting Cohort 4 in Ninth Grade. NEPS Working Paper No.

THE NORTH CAROLINA TESTING PROGRAM ELEMENTARY AND MIDDLE SCHOOL GRADES 3 8

2013 A-F Letter Grade Accountability System TECHNICAL MANUAL

NAEP Grade 8 Academic Preparedness Research:


Wide Range Achievement Test 4 (WRAT4)

Interpreting and Using SAT Scores

Technical Processes used to Develop Colorado School Grades for Alternative Education Campuses

BASI Manual Addendum: Growth Scale Value (GSV) Scales and College Report

Master Program «Measurement in Psychology and Education»: First Outcomes and Further Steps. National Research University Higher School of Economics

State of New Jersey

Validity, reliability, and concordance of the Duolingo English Test

Why do girls' STEM aspirations differ between countries? How cultural norms and institutional constraints shape young women's occupational aspirations

Financial literacy ISSUE 6 / MARCH 2015

AMERICA S High School Graduates

State of New Jersey OVERVIEW SCHOOL OF CULINARY ARTS HOSPITALITY AND TOURI PASSAIC 150 PARK AVENUE PATERSON CITY

Comparative Indicators of Education in the United States and Other G-20 Countries: 2015

State of New Jersey

The Research Data Center (FDZ) at the Institute for Educational Progress (IQB)

2013 A-F LETTER GRADE ACCOUNTABILITY SYSTEM

State of New Jersey

RELATIONSHIP BETWEEN THE PATTERN OF MATHEMATICS AND SCIENCE COURSES TAKEN AND TEST SCORES ON ITED FOR HIGH SCHOOL JUNIORS

State of New Jersey OVERVIEW SCHOOL OF CULINARY ARTS HOSPITALITY & TOURISM PASSAIC 150 PARK AVENUE PATERSON CITY

The ACT Suite in High School

Abstract Title Page Not included in page count.

An Evaluation of Kansas City Reading Programs for Turn the Page Kansas City

Statistics. Measurement. Scales of Measurement 7/18/2012

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Raw Score to Scaled Score Conversions

Interpreting Reports from the Iowa Assessments

Assessing the financial knowledge of university students in Germany

Analysis of the Effectiveness of Online Learning in a Graduate Engineering Math Course

Maury County Keys to Success. Panel Input on College and Career Readiness for Maury County students

Descriptive Statistics

Bangor Central Elementary School Annual Education Report

ILLINOIS DISTRICT REPORT CARD

CARI. Assessment: Getting to the essence. Introduction. Assessments can be undertaken at varying degrees of diagnostic detail.

Report on the Scaling of the 2013 NSW Higher School Certificate. NSW Vice-Chancellors Committee Technical Committee on Scaling

Understanding Ohio s New Career-Technical Education Report Card

P-20 in Action Michigan s Focus on Career and College Ready Students: Student Success through Collaborative Efforts Pk-20

Russian Student Assessment Programmes and Trends in 2006

Arizona AYP Terms. Arizona AMOs. Math AMO (percent passing) Reading AMO

Allen Elementary School

EQUATING TEST SCORES

Potential Impact of Changes in MN Math Grad Testing for Students in Bloomington Public Schools

Highest Math Course Completed in High School

Feifei Ye, PhD Assistant Professor School of Education University of Pittsburgh

Orange County High 201 Selma Road, Orange, VA 22960

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

OECD Average and OECD Total

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

Algebra 1 Course Information

TIMSS & PIRLS INTERNATIONAL STUDY CENTER, LYNCH SCHOOL OF EDUCATION, BOSTON COLLEGE

INTRODUCTION (Syllabus, Numerical Methods & Computational Tools)

Progress Monitoring for Specific Learning Disabilities (SLD) Eligibility Decisions

Interpretive Report of WISC-IV and WIAT-II Testing - (United Kingdom)

District: LITTLESTOWN AREA SD AUN: Test Date: PSSA Spring English Language Arts. Mathematics Science

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Information for international applicants (bachelor degrees)

Australians get fail mark on what works to improve schools

Interpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program

Glossary for the Database on Learning Assessments

Education Research Brief

A Laboratory for Research, a Studio for Child Learning, and a Site for Student Clinical Experience in the UJ Institute for Childhood Education

Algebra 2 Notes AII.7 Functions: Review, Domain/Range. Function: Domain: Range:

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Summer Assignment for incoming Fairhope Middle School 7 th grade Advanced Math Students

Chapter 1 Elementary and Secondary Education

Parental Occupation Coding

Findings from IEA s Trends in International. and Ina V.S. Mullis

Local outlier detection in data forensics: data mining approach to flag unusual schools

The Standards-based IEP Process: What You Need to Know

School Leader s Guide to the 2015 Accountability Determinations

Transcription:

Comparing Apples with Oranges? Linking National and International Large-scale Assessments Olaf Köller Leibniz Institute für Science and Mathematics Education (Kiel) and Centre for International Student Assessment (Munich) "Standard setting: International state of research and practices in the nordic countries Oslo, September 23, 2015 Prof. Dr. Olaf Köller, Leibniz Institute for Science and Mathematics Education

Comparing Apples with Oranges? Co-operation Project Annika Nissen Timo Ehmke Olaf Köller Christoph Duchardt Core Reference: Nissen, A., Ehmke, T., Köller, O. & Duchardt, C. (2015). Comparing apples with oranges? An approach to link TIMSS and the National Educational Panel Study in Germany via equipercentile and IRT methods. Studies in Educational Evaluation, 47, 58-67. DOI 10.1016/j.stueduc.2015.07.003 0191-491X. Prof. Dr. Olaf Köller, Leibniz Institute for Science and Mathematics Education 2

Starting point: Large-scale Assessments in Germany 2009 2010 2011 2012 2013 2014 2015 2016 PIRLS u u TIMSS u u PISA u u u NA-PS u u NA-SS u u u NEPS u u u PIRLS: Progress in Reading Literacy Study TIMSS: Trends in Mathematics and Science Study NA- PS: National Assessment in Primary School (Grade 4; German, Mathematics) NA- SS: National Assessment in Secondary School (Languages vs. Math & Science) NEPS: National Educational Panel Study; Data Collections in Grades 5 and 9 Prof. Dr. Olaf Köller, Leibniz Institute for Science and Mathematics Education 3

Samples PIRLS: Nationally representative sample of approx. n = 4.500 students at the end of grade 4 TIMSS: Nationally representative sample of approx. n = 4.500 students at the end of grade 4 PISA: Nationally representative sample of approx. n = 5.000 15- year old students and of approx. n = 9.000 9 th graders NA-PS: Representative samples of all 16 federal states (approx. n = 2.000 per state) at the end of grade 4 NA-SekS: Representative samples of all 16 federal states (approx. n = 3.000 per state) at the end of grade 9 (approx. n = 50.000) NEPS: Nationally representative samples of students at the beginning of grade 5 (n = 7.500) and at the beginning of grade 9 (n = 15.000) 23.09.15 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Test Designs PIRLS: Multi-matrix design; 80 minutes testing (3PL) TIMSS: Multi-matrix design; 80 minutes testing math and science (3PL) PISA: Multi-matrix design; 120 minutes testing; major domain and minor domains (1PL) NA-PS: Multi-matrix design; 80 minutes testing, parts of the sample only take mother tongue, others math, others mother tongue plus math (1PL) NA-SS: Multi-matrix design; 120 minutes testing, 60 minutes for each domain (1PL) NEPS: All students work on same items, 30 minutes math, 30 minutes science, 30 minutes reading (1PL) 23.09.15 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Reasearch Questions Strong political as well as scientific pressure to link studies (see e.g., similar activities in USA, where NAEP 8 and TIMSS have been linked in 2011) Different tests, different constructs? Different tests, different proficiency level models? Can we use national tests to assess our students on international scales and vice versa? 23.09.15 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Research Question I: Graphical Illustration 23.09.15 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Reasearch Questions II and III: Graphical Illustration National Proficiency Levels International Proficiency Levels Level 5 Level 4 National Student and Item Sample Level 5 Level 4 Level 3 Level 3 Level 2 Level 1 International Student and Item Sample Level 2 Level 1 23.09.15 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Linking TIMSS, PIRLS, National Assessment, and National Educational National Assessment German/Math 4 th graders (1 class per school) 1.300 schools TIMSS/PIRLS 4 th graders (1 class per school) 201 schools TIMSS/NEPS/ National Assessment 4 th graders (1 class per school) 80 schools

Linking TIMSS, PIRLS, National Assessment, and National Educational TIMSS/PIRLS 4 th graders (1 class per school) 201 schools TIMSS/NEPS/ National Assessment 4 th graders (1 class per school) 80 schools

Both studies measure mathematics competencies Aim: Link these studies to use international benchmarks Often two linking-methods are distinguished: Classical Test Theory Equating Which method fits? Item-Response-Theory Equating the data better Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Which linking method (CTT or IRT) should be preferred regarding the: (1) descriptive measurements? (2) the classification accuracy to the TIMSS International Benchmarks? (3) analysis of different subgroups? Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Different conceptual approaches, but both with the aim to measure mathematics competencies at the end of primary school (TIMSS) respectively at the beginning of grade 5 (NEPS) Leibniz Institute for Science and Mathematics Education, Kiel, Germany

% Linking TIMSS and National Educational

78 Primary Schools in Germany 80 classes N = 733 fourth graders (52 % male, 48 % female) Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Leibniz Institute for Science and Mathematics Education, Kiel, Germany

TIMSS 2011, Mathematics, Grade 4: 3-parameter IRT model Fixed item parameters from international database Transformation of Students PVs into international TIMSS achievement scale metric NEPS 2010, Mathematics, Grade 5: 1-parameter Rasch Model Fixed item parameters taken from NEPS 2010 Students WLEs transformed into only positive integer Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Model 1 (2 Dim.) Model 2 (1 Dim.) N Parameter Deviance AIC BIC CAIC NEPS- LV 2- dim 752 238 51338 51814 52023 52261 NEPS- LV 1- dim 752 236 51375 51847 52054 52290 1PL model;; Findings from ConQuest 3.0 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

I. Classical Test TheoryEquating e.g. Equipercentile Equating: (Cartwright, 2012;; Hambleton et al., 2009) 1) Determine percentile ranks for the score distributions 2) Declare the scores with the same percentile as equivalent à Linking basis: Score distributions II. IRT Linking (Pietsch et al., 2009;; NCES, 2013) 1) Estimating item parameters 2) Scaling estimated parameters to a base IRT scale (linear transformation) 3) Transform true scores of new test form to true score scale on old form. à Linking basis: modeling student s responses to items Leibniz Institute for Science and Mathematics Education, Kiel, Germany

I. Equipercentile Equating (1) Finding percentile rank for each score value of the TIMSS-test and NEPStest (2) Matching the scores by the corresponding percentile values using the Software LEGS 2.01 (Brennan, 2004) Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Results Equipercentile Equating NEPS score distribution TIMSS score distribution x* freq cum freq f(x) F(x) P(x) 0 3 3 0.00 0.00 0.21 29 2 5 0.00 0.01 0.55 53 7 12 0.01 0.02 1.16 74 8 20 0.01 0.03 2.18 93 7 27 0.01 0.04 3.21 111 6 33 0.01 0.05 4.09 127 10 43 0.01 0.06 5.18 142 15 58 0.02 0.08 6.89.................. 478 5 678 0.01 0.92 92.16 493 18 696 0.02 0.95 93.72 510 6 702 0.01 0.96 95.36 528 12 714 0.02 0.97 96.59 570 13 727 0.02 0.99 98.30 597 1 728 0.00 0.99 99.25 631 2 730 0.00 1.00 99.45 754 3 733 0.00 1.00 99.80 y** freq cum freq g(y) G(y) Q(y) 355 1 1 0.00 0.00 0.07 375 2 3 0.00 0.00 0.27 380 1 4 0.00 0.01 0.48 385 1 5 0.00 0.01 0.62 395 3 8 0.00 0.01 0.89 405 2 10 0.00 0.01 1.23 410 2 12 0.00 0.02 1.50 415 3 15 0.00 0.02 1.84.................. 690 3 720 0.00 0.98 98.16 695 2 722 0.00 0.99 98.50 700 2 724 0.00 0.99 98.77 705 1 725 0.00 0.99 98.98 710 2 727 0.00 0.99 99.18 715 2 729 0.00 1.00 99.45 720 2 731 0.00 1.00 99.73 735 1 732 0.00 1.00 99.93

Results Equipercentile Equating 750 700 650 600 TIMSS scores 550 500 450 400 350 0 100 200 300 400 500 600 700 800 NEPS scores

IRT Linking (1) Scaling TIMSS & NEPS data simultaneously in a single IRT model by fixing item parameters of TIMSS international scale (2) Calibrating NEPS data with item parameters of the common scaling (3) Converting student s score into score equivalents in the TIMSS international scale Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Results Equating Descriptive statistics for NEPS and TIMSS MEAN SD SKEW KURT NEPS 307 118 0.27 3.35 TIMSS 545 64 0.07 3.06 Equipercentile Equating 545 63 0.06 2.99 IRT- Equating 545 72 0.07 2.79 Classification of students to TIMSS International Benchmarks in Mathematics TIMSS 2011 International Benchmarks Cohen's < low low intermediate high advanced Sum Kappa TIMSS 1.2% 13.1% 38.6% 37.7% 9.4% 100.0% Equipercentile Equating 0.7% 14.3% 39.6% 37.2% 8.2% 100.0% 0.384 IRT Equating 1,6% 13.4% 39.6% 33.7% 11.7% 100.0% 0.371 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Results Equating IRT Linking Equipercentile Equating 1 2 3 4 5 Sum 1 0.4 0.7 0.0 0.0 0.0 1.1 2 0.0 6.1 3.7 0.2 0.0 9.9 3 0.0 0.0 25.1 8.4 0.1 33.6 4 0.0 0.0 0.0 37.8 2.6 40.4 5 0.0 0.0 0.0 1.9 13.1 15.0 Sum 0.4 6.8 28.8 48.3 15.8 100.0 Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Discussion Application of linking NEPS and TIMSS: Criterium-based interpretation of the NEPS mathematics scores Basis for longitudinal studies on students that fail the lowest (or reach the highest) educational standards In the Validation Study a) both methods lead to à Same estimates of population means à Classification accuracy of proficiency levels is satisfying à Similar Skwenes and Kurtosis b) Equipercentile methods should be prefered regarding à estimation of standard deviations Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Thank you very much for your attention! Contact: koeller@ipn.uni-kiel.de Prof. Dr. Olaf Köller, Leibniz Institute for Science and Mathematics Education 27