School tracking and development of cognitive skills additional results



Similar documents
An Alternative Way to Measure Private Equity Performance

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Can Auto Liability Insurance Purchases Signal Risk Attitude?

How To Calculate The Accountng Perod Of Nequalty

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

Marginal Returns to Education For Teachers

Heterogeneous Paths Through College: Detailed Patterns and Relationships with Graduation and Earnings

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

14.74 Lecture 5: Health (2)

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Overview of monitoring and evaluation

DEFINING %COMPLETE IN MICROSOFT PROJECT

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

CHAPTER 14 MORE ABOUT REGRESSION

Does Higher Education Enhance Migration?

Traffic-light a stress test for life insurance provisions

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

STAMP DUTY ON SHARES AND ITS EFFECT ON SHARE PRICES

Calculation of Sampling Weights

1. Measuring association using correlation and regression

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Military Conscription and University Enrolment: Evidence from Italy

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

! # %& ( ) +,../ # 5##&.6 7% 8 # #...

Forecasting the Direction and Strength of Stock Market Movement

Using Series to Analyze Financial Situations: Present Value

Statistical Methods to Develop Rating Models

The Current Employment Statistics (CES) survey,

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Multiple-Period Attribution: Residuals and Compounding

Searching and Switching: Empirical estimates of consumer behaviour in regulated markets

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Financial Mathemetics

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

7.5. Present Value of an Annuity. Investigate

DO LOSS FIRMS MANAGE EARNINGS AROUND SEASONED EQUITY OFFERINGS?

LIFETIME INCOME OPTIONS

Returns to Experience in Mozambique: A Nonparametric Regression Approach

Survive Then Thrive: Determinants of Success in the Economics Ph.D. Program. Wayne A. Grove Le Moyne College, Economics Department

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses

The OC Curve of Attribute Acceptance Plans

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

Small pots lump sum payment instruction

Traditional versus Online Courses, Efforts, and Learning Performance

Gender differences in revealed risk taking: evidence from mutual fund investors

An Interest-Oriented Network Evolution Mechanism for Online Communities

HARVARD John M. Olin Center for Law, Economics, and Business

Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts

Economic Interpretation of Regression. Theory and Applications

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Stress test for measuring insurance risks in non-life insurance

An Empirical Study of Search Engine Advertising Effectiveness

ADVERSE SELECTION IN INSURANCE MARKETS: POLICYHOLDER EVIDENCE FROM THE U.K. ANNUITY MARKET *

Tuition Fee Loan application notes

Bargaining at Divorce: The Allocation of Custody

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Capacity-building and training

Demographic and Health Surveys Methodology

Management Quality, Financial and Investment Policies, and. Asymmetric Information

Calculating the high frequency transmission line parameters of power cables

Management Quality and Equity Issue Characteristics: A Comparison of SEOs and IPOs

Analysis of Premium Liabilities for Australian Lines of Business

Section 5.4 Annuities, Present Value, and Amortization

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

Which Factors Determine Academic Performance of Economics Freshers?. Some Spanish Evidence

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

STATISTICAL DATA ANALYSIS IN EXCEL

SIMPLE LINEAR CORRELATION

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

What is Candidate Sampling

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

General or Vocational? Evidence on School Choice, Returns, and Sheep Skin Effects from Egypt 1998

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Wage inequality and returns to schooling in Europe: a semi-parametric approach using EU-SILC data

Dynamics of Toursm Demand Models in Japan

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Transcription:

ömmföäflsäafaäsflassflassflas fffffffffffffffffffffffffffffffffff Dscusson Papers School trackng and development of cogntve sklls addtonal results Sar Pekkala Kerr Wellesley College Tuomas Pekkarnen Aalto Unversty, IZA and IFAU Roope Uustalo HECER, IZA and IFAU Dscusson Paper No. 350 August 2012 ISSN 1795-0562 HECER Helsnk Center of Economc Research, P.O. Box 17 (Arkadankatu 7), FI-00014 Unversty of Helsnk, FINLAND, Tel +358-9-191-28780, Fax +358-9-191-28781, E-mal nfo-hecer@helsnk.f, Internet www.hecer.f

HECER Dscusson Paper No. 350 School trackng and development of cogntve sklls addtonal results* Abstract Ths paper evaluates the effects of selectve vs. comprehensve school systems on mltary test scores n mathematcal, verbal and logcal reasonng sklls tests. We use data from the Fnnsh comprehensve school reform whch replaced the old two-track school system wth a unform nne-year comprehensve school. The paper uses a dfferences-n-dfferences approach and explots the fact that the reform was mplemented gradually across the country durng a sx-year perod. We fnd that the reform had a small postve effect on the verbal test scores, but no effect on the mean performance n the arthmetc or logcal reasonng tests. However, the reform sgnfcantly mproved scores on all tests for the students whose parents had only basc educaton. JEL Classfcaton: H52, I21 Keywords: educaton, school system, trackng, comprehensve school, test scores. Sar Pekkala Kerr Tuomas Pekkarnen Wellesley College Aalto Unversty School of Economcs WCW, 106 Central Street P.O. Box 21240 Wellesley, MA 02481 00076 Aalto USA FINLAND e-mal: skerr3@wellesley.edu e-mal: tuomas.pekkarnen@aalto.f Roope Uustalo Helsnk Center of Economc Research (HECER) P.O. Box 17 (Arkadankatu 7) FI-00014 Unversty of Helsnk FINLAND e-mal: roope.uustalo@helsnk.f * Ths workng paper contans addtonal results that dd not ft nto the verson submtted to Journal of Labor Economcs. The authors would lke to thank Davd Autor, Alexs Brownell, Lda Farré, Wllam Kerr, Sandra McNally, Eva Mörk, as well as semnar partcpants at London School of Economcs, Unverstat de Alcante, Wellesley College, ESPE Conference n London, and EALE Conference n Amsterdam for helpful comments. Pekkarnen and Pekkala Kerr are grateful for fnancal assstance from the Academy of Fnland and Yrjö Jahnsson Foundaton, respectvely.

1. Introducton Internatonal comparsons of student achevement, such as the OECD s Programme for Internatonal Student Assessment (PISA), have generated a growng nterest n the effect of school systems on student outcomes. Accordng to these comparsons, the dfferences n average test results across countres wth roughly equal school resources are very large. Also, the dsperson of test scores vares consderably across countres. One potental explanaton for cross-country dfferences n the level and, n partcular, the varance of achevement scores s the extent and tmng of trackng, or ablty groupng of students. For example, the OECD has repeatedly argued that varaton n student performance tends to be hgher n countres wth early trackng polces (e.g. OECD, 2003). Hgh varance of student achevement and ts correlaton wth famly background have been seen as problematc from the perspectve of equalty of opportunty. On the other hand, postponng trackng could lower the qualty of teachng at least for the hgh-ablty students. Implct n ths debate s an effcency-equty trade-off: postponement of trackng could mprove equalty but mght decrease the average achevement. The tmng of trackng dffers sgnfcantly between comprehensve and selectve school systems. In the selectve system, trackng students nto dfferent types of schools occurs early, and choces made as early as age ten largely determne later schoolng optons. In the comprehensve system, students often stay n the same schools untl the end of secondary school. In ths study we evaluate the effect of the Fnnsh comprehensve school reform on cogntve sklls tests. Fnland had a selectve two-track school system untl the 1970 s, when the school reform replaced the old two-track system wth a unform comprehensve school system that s smlar to those n other European countres. As a result of the reform the trackng age was postponed from age 10 to 15. The dfferences between the pre- and post-reform systems are smlar to the cross-country dfferences n school systems n the OECD countres today. The effects of the Fnnsh reform are therefore nformatve for the current schoolng polcy debate. Prevous studes such as Meghr and Palme (2005) as well as Pekkarnen et al. (2009) have shown that comprehensve school reforms dd mprove the equalty of opportunty by decreasng the ntergeneratonal correlaton of earnngs. However, the earnngs effects 1

reported n prevous studes could be due to peer effects, socal networks, openng of new educatonal opportuntes or drect mpact on productve sklls. Usng prevously unavalable data from the Fnnsh Defense Forces we can partally open the black box relatng school systems to labor market outcomes. We use scores from cogntve sklls tests taken at the begnnng of mandatory mltary servce. These data allow us to examne how the comprehensve school system affected arthmetc and verbal sklls as well as logcal reasonng ablty. Ths s partcularly mportant as recent research emphaszes the role of cogntve sklls, rather than mere school attanment, as mportant determnant of ndvdual earnngs, the dstrbuton of ncome and economc growth. (Hanushek and Wößmann, 2008). Yet exstng evdence on the mpact of major school reforms on cogntve sklls s stll scarce. We use a smlar dfferences-n-dfferences approach as n Pekkarnen et al. (2009) and explot the fact that the school reform was mplemented gradually across regons. However, we evaluate the effects of the school reform drectly on the dstrbuton of sklls that the students are supposed to learn n school. Our results show that the reform had a small postve effect on verbal test scores, but lttle effect on the mean performance n the arthmetc or logcal reasonng tests. However, the reform sgnfcantly mproved the scores n all tests for students whose parents had only basc educaton or low ncome. At the same tme, the reform had no negatve effects on the test scores of students from more advantaged backgrounds. These results are qualtatvely n lne wth the results n Pekkarnen et al. (2009) where t was shown that the comprehensve school reform had a substantal negatve effect on the ntergeneratonal ncome elastcty. However, we fnd that the effects on cogntve sklls are far too small to fully explan the effects on ncome. The rest of the paper proceeds as follows. In the next secton, we revew the lterature on the effects of school trackng and compare our approach to the prevous studes. We then descrbe the content and the mplementaton of the Fnnsh comprehensve reform. The fourth secton descrbes the data and the Fnnsh Army Basc Sklls Test, the results of whch we use as a dependent varable. We then move on to present the dfferences-n-dfferences and maxmum lkelhood estmaton of the effect of the reform on test scores n secton fve and n the sxth secton we dscuss the results. The seventh secton concludes. 2

2. Prevous lterature Economc theory provdes somewhat ambguous predctons on the effect of comprehensve versus selectve school systems on student achevement. A comprehensve system, where the entre cohort s n the same class, ncreases heterogenety n the classroom. Ths probably makes classes more dffcult to teach and thereby may lower student achevement (Lazear, 2001). However, any changes n the class composton may also affect student achevement due to peer effects. The effect on the mean achevement depends on whether good students are harmed by bad students more than bad students beneft from beng around good students. 1 Even f the effect on the average student achevement s ambguous, a comprehensve school system should decrease the varance of test scores. Ths s due both to a more homogenous currculum and to less segregated peer groups. Furthermore, as early educatonal choces are more lkely to be determned by famly background (Brunello and Checch, 2007), later trackng age n the comprehensve system may reduce the correlaton between the test scores and the famly background. The most convncng emprcal evdence on the effects of ablty trackng on test scores comes from a feld experment n Kenya where randomly selected schools mplemented trackng and non-trackng polces. Duflo et al. (2011) show that trackng wthn schools seems to beneft all students. However, t s not clear whether these results can be generalzed to developed countres where the student populaton s less heterogeneous. Furthermore, n selectve systems students are typcally not tracked wthn schools but to dfferent types of schools, whch necessarly mples that teacher qualty and currculum may vary consderably across the tracks. In addton, the debate on the relatve advantages of dfferent school systems s prmarly concerned wth trackng nto dfferent types of schools rather than trackng wthn schools. Hence, even a well-desgned randomzed experment of trackng wthn schools s unlkely to settle the polcy queston of whether the entre school system should be selectve or comprehensve. In developed countres, most of the exstng evdence on the potental benefts of selectve versus comprehensve system orgnates from cross-country comparsons. For example, Hanushek and Wößmann (2006) fnd that the varance n test scores n nternatonal student 1 In the Lazear (2001) model the peer effects cannot be dstngushed from the currculum effects. In hs model ll-behavng students stop the teachng for the entre class. Hence, average student qualty determnes how much tme the teacher can spend n teachng and how much materal can be covered. 3

assessments s hgher n countres where trackng takes place at an early age. At the same tme, early trackng seems to have generally negatve effects on mean performance. In contrast, Brunello and Checch (2007) and Waldnger (2006), who use a smlar cross-country approach, fnd no effects on the test score varance. These conflctng results from prevous studes reflect, n part, the dffcultes n analyzng the effects of school system based on cross-country data. Whle these studes try to control for varaton due to other factors by controllng for early test scores (Hanushek and Wößmann, 2006; Waldnger, 2006) or by usng tme varaton n the trackng age (Brunello and Checch, 2007), t s far from clear that all relevant cross-country dfferences are relably accounted for. Analyzng changes n test scores when a country swtches from a tracked to a comprehensve system appears to be a more promsng approach to dentfy the effects of the school system on student achevement. Prevous attempts to do ths nclude Kerckhoff et al. (1996) and Galndo-Rueda and Vgnoles (2005), both of whom study the effect of a gradual movement from a selectve school system to a comprehensve system n England. However, as noted by Mannng and Pschke (2006), the areas that frst swtched to the comprehensve system n England were on average poorer than the areas whch retaned the tracked system. It s therefore dffcult to dsentangle the effect of school systems from the regonal dfferences usng common data sources such as the Natonal Chld Development Survey, that only contan a sngle cohort. Relatve to the earler studes, the dstnct advantage of the Fnnsh reform s the avalablty of panel data from several cohorts, whch avods the need to rely exclusvely on crosssectonal varaton. The Fnnsh comprehensve school reform was mplemented gradually regon by regon between 1972 and 1977. Ths gradual mplementaton allows us to control for regonal varaton and any tme trends n student achevement usng a dfference-ndfferences approach, whch avods bases such as those dscussed by Mannng and Pschke. Furthermore, the data also nclude nformaton on famles, whch makes t possble to estmate the effect of the reform usng data on brothers who were placed nto dfferent school systems because they dffered n age. 4

3. Comprehensve school reform 2 3.1 Background Fnland ntroduced a wde-rangng comprehensve school reform n the 1970 s. Smlar reforms had already taken place n Sweden n the 1950s and n Norway n the 1960s (Meghr and Palme, 2005; Aakvk et al. 2010). The Fnnsh comprehensve school reform abolshed the old two-track school system and created a unform nne-year comprehensve school. The man motvaton of the reform was to provde equal educatonal opportuntes to all students, rrespectve of place of resdence or socal background. In the pre-reform system all students entered prmary school ( kansakoulu ) at the age of seven. After four years n the prmary school, at age 11, the students were faced wth the choce of applyng to general secondary school ( oppkoulu ) or contnung n the prmary school. Admssons to the general secondary school were based on an entrance examnaton, a teacher assessment and prmary school grades. Those who were admtted to the general secondary school (52% of the cohort n 1970) contnued frst n the junor secondary schools for fve years, and often went on to the upper secondary school for three addtonal years. At the end of the upper secondary school the students took the matrculaton examnaton that provded elgblty for unversty-level studes. Those who were not admtted or who dd not apply to the general secondary school track contnued n the prmary school. The prmary school lasted altogether eght years. The last two years of prmary school concentrated on teachng vocatonal sklls and were called contnuaton classes or cvc school. After an amendment n 1963 muncpaltes could further extend these cvc school courses by a year, thereby creatng a nne-year prmary school. The mnmum school leavng age, regardless of the track, was sxteen, unless the student had already completed all requred prmary school courses. The pre-reform system s descrbed schematcally n the left-hand panel of Fgure 1. [Fgure 1: SCHOOL SYSTEMS] 2 Ths secton draws on Pekkarnen et al. (2009). 5

3.2 Content of the comprehensve school reform The reform ntroduced a new currculum and changed the structure of prmary and secondary educaton. The new currculum ncreased the academc content of educaton compared to the old prmary school currculum by ncreasng the share of mathematcs and scences. In addton, one foregn language became compulsory for all students. The new comprehensve school currculum resembled the old general secondary school currculum and exposed the pupls who n the absence of the reform would have stayed n the prmary school, to a sgnfcantly more academc educaton. The structure of the post-reform school system s descrbed n the rght-hand panel of Fgure 1. The prevous prmary schools, cvc schools and junor secondary schools were replaced by nne-year comprehensve schools. At the same tme, the upper secondary school was separated from the junor secondary school nto a dstnct nsttuton. After the reform, all pupls followed the same currculum n the same establshments (comprehensve schools) up to age sxteen. After nne years n the new comprehensve school, students could choose between applyng to upper secondary school or to vocatonal schools. Admsson to both tracks was based solely on comprehensve school grades. Unlke comprehensve school reforms n many other European countres, the Fnnsh reform dd not extend the length of compulsory schoolng or change the mnmum school leavng age, whch had been sxteen ever snce 1957. Although the pre-reform system oblged the muncpaltes to provde only eght years of prmary school, analyss of qunquennal census data reveals that by the tme of the comprehensve school reform most muncpaltes were already offerng a full nne years of prmary school for pupls who dd not go to junor secondary school. In 1975, when the reform had not yet reached the nnth grade, 92.6% of the ffteen-year-olds that would be n the nnth grade f they progressed at the typcal speed were stll n school. Ths fracton remaned at 92.6% n 1980 when the reform had reached the nnth grade n all but the last reform regon. In 1985, well after the reform was completely mplemented, the fracton of those turnng ffteen whle stll n school was only slghtly hgher, 93.9%. To us, these numbers suggest that the comprehensve school reform dd not ncrease the actual mnmum school leavng age. The effect of the reform can thus be nterpreted as comng through changes n the currculum and n the tmng of trackng choces, whch naturally also mples changes n peer groups. 6

3.3 The mplementaton of the comprehensve school reform The mplementaton of the reform was preceded by a process of plannng that lasted for two decades. The frst expermental comprehensve schools started ther operaton n 1967. In 1968, the Parlament approved the School Systems Act (467/1968), accordng to whch the two track school system would be gradually replaced by a nne-year comprehensve school system. The adopton of the new school system was to take place between 1972 and 1977. A regonal mplementaton plan dvded the country nto sx mplementaton regons and dctated when each regon would mplement the comprehensve school system. Regonal school boards were created to oversee the transton process. The muncpaltes that were responsble for operatng the school system could not select the reform date but were forced to follow the plan desgned by the Natonal Board of Educaton. In each regon, pupls n the fve lowest prmary school grades started n the comprehensve school durng the fall term of the year stated n the mplementaton plan. After that, each ncomng cohort of frst graders started ther schoolng n the comprehensve school. The pupls who were already above the ffth grade n the year when the reform was mplemented n ther home regon completed ther schoolng accordng to the pre-reform system. Thus, n each regon t took approxmately four years to fully complete the reform. Fgure 2 llustrates how the reform spread through the Fnnsh muncpaltes durng 1972 to 1977. The muncpaltes n whch the reform was mplemented n 1972 were predomnantly stuated n the northernmost provnce of Lapland. In 1973, the reform was mplemented n the north-eastern regons, n 1974 n the northwest, n 1975 n the south-east, n 1976 n the south-west, and fnally, n 1977 n the captal regon of Helsnk. [Fgure 2: COMPREHENSIVE SCHOOL REFORM MAP] The comprehensve school reform faced ntense resstance. The most common argument aganst the reform was that abolshng trackng would reduce the qualty of educaton. As a compromse, ablty trackng was partally retaned so that math and foregn languages were taught at dfferent levels wthn the comprehensve school. Ths ablty groupng was eventually abolshed n 1985. A reform of ths scale naturally mpled mportant changes n teacher educaton and the nternal organzaton of schools. In the old system prmary school teachers were educated n 7

separate non-unversty nsttutons. Intally, these teachers contnued to teach n the new comprehensve schools. Eventually, teacher educaton was moved nto newly founded schools of educaton wthn unverstes. Over tme the reform therefore led to an ncrease n the average duraton and an mprovement n the qualty of teacher tranng. The mplementaton of the Fnnsh comprehensve school reform makes t n many ways a promsng natural experment for evaluatng the effects of trackng on student outcomes. A partcularly useful setup was created by the regonal mplementaton plan that dctated when each muncpalty moved nto the comprehensve school system. Ths allows us to use a fxed-effects approach to control for the changes over tme and any regonal dfferences between the muncpaltes that were assgned to dfferent mplementaton regons. Gven that we use data from the very frst cohorts that were affected by the reform, the effects are lkely to be somewhat attenuated. For example, n the early years ablty trackng was retaned n certan subjects, most teachers stll only had the shorter prmary school teacher educaton, and the mergng of separate schools nto the same physcal unts probably caused dsruptons n the organzatonal structure. 4. Data A fundamental problem n assessng the effects of school reform on student performance s that students n separate school systems rarely partcpate n comparable tests. Sometmes t s possble to use naton-wde performance evaluatons or nternatonal comparsons of student achevement. However, snce most large-scale school reforms took place n 1960s and 1970s when testng was not as wdespread as today, t s dffcult to fnd tests admnstered to representatve and reasonably large samples of students from both pre- and post-reform school systems. Ths paper uses the results from the Basc Sklls test of the Fnnsh Army. Snce mltary servce s mandatory for men n Fnland, almost the entre male cohort takes the test. The average age at tme of testng s 20, so obvously factors other than the school system may also have had an effect on the test results. On the other hand, the longer-lastng outcomes of school systems are probably more nterestng than the mmedate effects on test results. In addton, the Basc Sklls test s also a strong predctor of earnngs and occupaton later n lfe, 8

so any effect of school system on the test scores wll have mportant consequences for lfetme earnngs. 3 The Fnnsh Army Basc Sklls test s desgned to measure general abltes. The Army uses these test results n selectng conscrpts to offcer tranng. Unlke the Swedsh and Norwegan mltary test score data, used for example by Black, Devereux, and Salvanes (2010) and by Lndqvst and Vestman (2011), the Fnnsh test score data are avalable n a dsaggregated form. The test conssts of three subcategores: verbal, arthmetc, and logcal reasonng. Each subtest ncludes forty multple choce questons. In the verbal reasonng subtest, the subject has to choose synonyms or antonyms of gven words, select words that belong to the same category as a gven word, exclude words from a group of words, and dentfy smlar relatonshps between word pars. The arthmetc reasonng test asks the subject to complete number seres, solve verbally expressed mathematcal problems, compute smple arthmetc operatons, and choose smlar relatonshps between pars of numbers. The logcal reasonng test s a standard culture free ntellgence test based on Raven s progressve matrces and ts results should therefore be less affected by pre-test schoolng. 4 On the other hand, both the verbal and arthmetc reasonng categores test sklls that are prmarly taught n school. The test was orgnally created n 1955 and re-desgned n 1981. Exactly same test was used over the entre tme span analyzed here. From 1982 onward, the test results are stored n the Army database that also ncludes personal dentfcaton numbers, makng t possble to lnk the test results to nformaton on test takers from other data regsters. Our data nclude all conscrpts who were born between 1962 and 1966 and who were found n the Army database,.e., those who started ther mltary servce after January 1982. There s some selectvty n the data due to the fact that t s possble to enter mltary servce as a volunteer before age 20. Thus some men n the oldest cohorts served before the Army regster was created. We expermented wth several solutons to ths problem. For example, we lmted the analyss to those who served n the army at age 20. We also restrcted the data to men born between 1964 and 1966,.e., those that we can observe even f they volunteered for early servce at age 18 or 19. Snce ths made qualtatvely lttle dfference, we only report the results from usng the full sample and smply control for age at test. 5 3 Uustalo (1999) reports that recruts who scored one standard devaton hgher n the Basc Sklls tests earn on average 6% more than recruts wth smlar educaton and experence but lower test scores. 4 The contents of the tests are descrbed n detal n Thonen et al. (2005). 5 Results based on restrcted samples can be found n Appendx 3. 9

It s possble to be exempted from the mltary servce due to relgous or ethcal convcton though n 1980s ths was rare. More common reasons for beng exempt from mltary servce were severe health condtons, most often related to mental health problems. However, even these crtera were substantally strcter n the 1980s than today. A comparson of the number of observatons by brth cohort n our data and the correspondng cohort sze n the 1980 populaton census reveals that our test score data contan nformaton on 85.3 percent of the relevant male brth cohorts 6. Ths corresponds closely to the reported fracton of the cohort that served n the mltary n the 1980s (Fnnsh Defense Command, 2000). Fgure 3 plots the dstrbuton of the raw scores,.e. the number of correct answers n each subtest. The dstrbuton of the average score s plotted n the bottom rght corner. These hstograms clearly show that there s plenty of varaton n the test scores; the raw scores are dstrbuted over the whole range from zero to forty. Also, the dstrbuton of the average scores, s symmetrc and not very far from normal dstrbuton. [Fgure 3: DISTRIBUTION OF THE TEST SCORES] Per our request, Statstcs Fnland lnked the test scores to census data on the Fnnsh populaton. The Statstcs Fnland longtudnal census fle contans data on the entre populaton lvng n Fnland n 1970, -75, -80, -85 and -90. From 1990 onwards nformaton s avalable annually. Fnnsh census data are based almost entrely on admnstratve regsters. For example, nformaton on the place of resdence n each census year s based on the Populaton Regster. In general, these regster data are of very hgh qualty. Only a few persons have any mssng data, and the man reasons for not beng ncluded n the census data are resdng abroad and death. Therefore practcally all conscrpts were found n the regster data and our data does not suffer from attrton problems that often plague smlar studes. Census data were used to gather nformaton on the pupls date of brth and place of resdence n 1970, -75 and -80, whch jontly determne whether the ndvdual attended a tracked or a comprehensve school. Statstcs Fnland does not release these data wth a muncpalty-dentfer, but per our request created an ndcator classfyng muncpaltes nto sx categores accordng to the year n whch the comprehensve school reform was 6 Our data are collected from the Fnnsh Army database and contan no nformaton on those who dd not serve n the mltary. It s also clear that those dsmssed from the army due to health and mental reasons dffer n many ways from the rest of the populaton. However, a smple comparson of the number of observatons per cohort and regon n our data to the overall cohort sze n each regon ndcates that the comprehensve school reform had no effect on the lkelhood of servng n the mltary. 10

mplemented n each. Except for those who moved between census years between two muncpaltes that mplemented the reform at dfferent years, t s possble to accurately determne whch school system was n place when the students were at the relevant age. The movers were dropped from the data used below, resultng n a reducton of the sample by 10%. 7 The census data also nclude famly codes that can be used to dentfy brother pars and to gather nformaton on parents educaton and earnngs. To be more exact, these famly codes are based on persons lvng n the same household, not necessarly bologcal famly members. We use the famly codes from the 1975 census, when the oldest men n the sample were thrteen years old and most lkely stll lvng at home. Table 1 reports the mean test scores by cohort and reform mplementaton regon as devatons from overall sample means n standard devaton unts. Implementaton regons are defned by the year when the reform took place n the regon. There are large dfferences across regons and a general ncrease n the test scores over tme. These regonal dfferences are correlated wth the average parental educaton and ncome levels of the regons, reported n the last two rows of the table. An ncrease n test scores over tme, generally known as the Flynn effect, has also been documented usng the same data by Kovunen (2007) for a longer tme perod. However, ths effect also reflects dfferences between cohorts other than those due to the school system. The shaded area of the table ndcates the students who attended comprehensve school. Snce these students are younger and are concentrated n the regons wth below average test scores, t s obvous that a cross-secton comparson of regons or a tme-seres comparson of subsequent cohorts would not produce relable estmates for the effect of the comprehensve school reform. Smlar tabulatons are shown for the three subtests n Appendx 1. [Table 1: MEAN SCORE BY COHORT & REGION] 7 As a robustness check we ncluded the movers and determned the reform year based on the place of resdence n the 1975 census. Ths made practcally no dfference for the results. 11

5. Estmaton methods Our goal s to estmate the causal effect of the school regme on the Army test scores. That s, to determne how an average student, or a student wth certan characterstcs, would have fared, had she or he been assgned to the reformed comprehensve system nstead of the prevous selectve early trackng system. A fxed effects approach s used to control for regonal dfferences as well as general trends over tme. The effect of the comprehensve school reform s dentfed because the tmng of the reform dffers across regons. Most of the estmates are based on the followng regresson model: y D j Dt A C j t (1) ( ) ( ) ( ) ( ) where y s the army test score of ndvdual, attendng school n regon j and belongng to cohort t. D j and D t are regon and cohort specfc dummes and s the age at test. C j()t() s an ndcator, varyng across cohorts and regons, that pupl attended comprehensve school. The parameter of nterest n (1) s. The dentfyng assumpton s that the comprehensve school ndcator, C j()t(), s uncorrelated wth the error term condtonal on the other regressors. Ths assumpton, and the fact that D j and D t enter (1) addtvely, reflect the basc dfferencesn-dfferences assumptons. Note, n partcular, that we make no assumptons regardng the smlarty of the regons where the reform took place early to those where reform took place later, nor do we clam that reform dates were randomly assgned. The parameter s an unbased estmate of the average causal effect of comprehensve schoolng, f the tmng of the reform s uncorrelated wth other regon-specfc changes n student outcomes. It s mportant to notce that regresson (1) controls for the regonal dfferences wth sx mplementaton regon dummes but not for the full set of more than fve hundred muncpal fxed effects. The man reason s that we have no access to the muncpalty codes due to data protecton regulatons. However, the only reason to nclude muncpalty dummes n regresson (1) would be the concern that the reform took place earler n non-randomly selected muncpaltes. But ths s only a problem f the reform dummy s correlated wth the muncpalty fxed effects. Ths correlaton s fully absorbed by ntroducng the sx mplementaton regon fxed effects, snce wthn these regons the mplementaton year does not vary across muncpaltes. 12

The fact that we only have 5 (cohorts) x 6 (regons) = 30 observatons for dentfcaton purposes obvously has mplcatons for statstcal nference. We deal wth ths by clusterng the standard errors at regon level usng the Moulton-procedure programmed for Stata by Angrst and Pschke (2009). Followng Cameron, Mller, and Gelbach (2007), we also make a small sample correcton for the standard errors by magnfyng the resduals by G /( G 1), where G s the number of regons, and use crtcal values from a t-dstrbuton wth G-2 degrees of freedom. 8 To facltate the nterpretaton of the statstcal sgnfcance of the results wth non-standard crtcal values, we report the 95% confdence ntervals nstead of standard errors of our regresson coeffcents n the tables. We also estmate (1) by nteractng C j()t() wth parental educaton and ncome. These results are nformatve n evaluatng whether the reform was partcularly successful n mprovng the cogntve sklls of students from dsadvantaged famly backgrounds relatve to other students. Furthermore, we also evaluate the effect of the reform on the varance of the test scores. A straghtforward approach for examnng ths s to model smultaneously the effect of the reform on both the mean and the varance of the test scores. Assumng that the error term follows a normal dstrbuton, the test scores wll be dstrbuted as y ~ 1 2 2 j( ) t( ) exp 1 2 y ( D j( ) D t( ) 2 j( ) t( ) A C j()t() ) 2 (2) 2 The subscrpts n jt ndcate that the varance n the test scores may vary across regons and cohorts and may therefore be affected by the reform. The model s parameterzed by assumng that log-varance s an addtve functon of the regon, cohort and reform dummes. 8 We were also concerned that seral correlaton wthn regons could lead to a bas n the standard errors. However, calculatng mean resduals by regon and cohort from equaton 1 and regressng these mean resduals on the lagged resduals ndcates that the frst order autocorrelatons of the resduals n dfferent test tems are between -.09 and.11. These estmates are consstent wth the null of no autocorrelaton accordng to an autocorrelaton test suggested by Wooldrdge (2002),.e. regressng the frst dfferenced resduals on ther lagged values and testng whether the resultng coeffcent s equal to -.5. As a fnal check we also estmated the confdence ntervals usng a wld bootstrap-t procedure that retans the cluster structure n the data and, accordng to Monte Carlo results by Cameron et al. (2008), performs well even wth only 6 clusters. In general, the wdth of the confdence ntervals does not vary much across the dfferent cluster correcton procedures, nor are the cluster corrected standard errors very dfferent from the OLS standard errors once cohort and regon fxed effects are ncluded n the model. We report alternatve estmates of confdence ntervals of key parameters n Appendx 2, but our man concluson s that, gven the lack of sgnfcant autocorrelaton n the data, the choce of cluster correcton method s not very mportant. Ths fndng s consstent wth the Monte Carlo experments n Bertrand et al. (2004). 13

14 ) exp( 2 jt t j jt C D D (3) The log-lkelhood functon of the normal heteroskedastc model s N t j t j t j t j N t j t j C A D D C A D D y C A D D N L 1 ) ( ) ( ) ( ) ( 2 ) ( ) ( ) ( ) ( 1 ) ( ) ( ) ( ) ( exp 2 1 2 1 ) ln( 2 2 ln (4) where measures the effect of the reform on the mean score and ts effect on the varance. 6. Results 6.1 Average effects The baselne results are reported n Table 2. To facltate the quanttatve nterpretaton of the results, the test scores are converted nto standard devaton unts. Column (1) smply regresses the average test score on the comprehensve school dummy, and shows that those who attended comprehensve school scored on average 0.095 standard devatons lower n the Army test. However, the results n column (2) reveal that ths negatve correlaton reflects the fact that regons wth on-average lower test scores mplemented the reform frst. When full sets of brth cohort and regon dummes are ncluded n the regresson, ths negatve correlaton s removed and we fal to fnd any sgnfcant effect of comprehensve school on average test scores. The causal nterpretaton of the result n column (2) of table 2 reles on the standard dfference-n-dfferences assumpton that the changes n test scores n the reform regons would have been smlar to the changes n the control regons n the absence of the reform. Gven that the panel spans several perods, ths assumpton can be relaxed somewhat by addng regon-specfc lnear trends n test scores as we do n column (3). The effect of the reform s now postve and close to beng statstcally sgnfcant at 5% level but very small at 0.025 standard devaton unts.

Column (4) adds famly fxed effects to the equaton, thus dentfyng the effect of the reform from the dfferences between brothers that attended dfferent school systems. The estmates become mprecse but are stll very close to those reported n column (3). Interestngly, addng famly fxed effects also reverses the postve trend n the test scores, ndcatng that the brth order effect on the test scores s larger than the dfference across the brth cohorts. 9 [Table 2: BASIC RESULTS] Table 3 examnes the effect of the school reform on dfferent tests n turn. Column (1) regresses each test score separately on the regon and cohort dummes and a dummy varable ndcatng whether the person attended a comprehensve school. Column (2) agan adds separate regon-specfc lnear trends and column (3) controls for the famly fxed effects. For brevty, only the coeffcents of the comprehensve school dummy are reported n each column. The comprehensve school reform had no sgnfcant effects on ether math or logcal reasonng tests. The effect on the verbal ablty test s postve and sgnfcant f regonal trends are ncluded n the model. The sze of the effect on verbal test scores ranges between 0.023 and 0.043 standard devaton unts. Famly fxed effect estmates tend to be much less precse than the estmates that explot between-famly varaton, and are therefore never sgnfcantly dfferent from zero or sgnfcantly dfferent from the pont estmates reported n columns (1) and (2). The fndng that the comprehensve school reform had ts largest effects on the verbal test was perhaps to be expected. After all, verbal sklls are learned n schools, and hence the changes n school system may have effects on these sklls. If ndeed the logcal reasonng test truly measures nnate reasonng abltes, pre-test schoolng should have lttle or no effect on the test. Fnally, the changes n the mathematcs teachng resultng from the reform were perhaps not as sgnfcant. As noted above, the ablty groupng was retaned n mathematcs and, as a result, math classes contnued to be taught at three dfferent ablty levels after the reform. [Table 3: EFFECTS ON DIFFERENT TEST ITEMS] Table 4 reports the maxmum-lkelhood estmates measurng the effects of the reform on both the mean and the log-varance of the test scores. These equatons are estmated separately for each test. All equatons nclude cohort and regon effects, regonal trends and 9 The brth order effect was also found n a Norwegan study of the Army test scores (Krstenssen and Bjerkdal, 2007). 15

age-at-test dummes on both the mean and the varance, but only the effects of the comprehensve school reform are reported. The maxmum-lkelhood method produces very smlar estmates for the effect of the reform on the mean scores as the lnear regresson model used n tables 2 and 3. The effects on means are sgnfcant only for the verbal test. The effects on the varance of the test scores are small but generally negatve. In the math test the effect s close to zero. In the verbal and logcal reasonng test the reform reduced the varance about 2.5 percent. None of these effects, however, are statstcally sgnfcant. [Table 4: EFFECTS ON MEAN AND VARIANCE] As shown n tables 2 and 3, the estmated effect of the reform on test scores s somewhat senstve to the ncluson of regon-specfc lnear trends. Ths rases a concern that the standard dfferences-n-dfferences specfcaton may fal to separate out pre-exstng trends from dynamc polcy responses, as ponted out by Wolfers (2006) n a dfferent context. To explore ths possblty, we re-estmated equaton (1) by addng dummes for years two and three pror to the reform and separate dummes for each of the fve years after the reform, thus omttng the dummy for one year pror to the reform. Ths flexble specfcaton should trace out any pre-reform trends and dynamc polcy responses followng the reform. The results are reported n table 5 and a vsual representaton for the average test scores s plotted n fgure 4. There s lttle ndcaton of any pre-reform trends. However, especally the verbal test scores, and to a lesser extent the average test scores, ncrease sgnfcantly n the frst year after the reform and stay at a hgher level or even grow n the years followng the reform. These results suggest that the pattern found n tables 2 and 3 does not reflect pre-exstng trends but rather stems from the true reform effect. [Table 5: THE EFFECT OF THE SCHOOL REFORM ALLOWING FOR LEADS & LAGS] [Fgure 4: LEADS AND LAGS] 6.2 Effects by parental background Tables 6A and 6B examne the effect of the comprehensve school reform on average test scores by famly background. Column (1) n table 6A estmates regresson models smlar to those reported n column (1) of table 3 but adds an ndcator of parents educaton and ts nteracton wth the reform dummy. Parents are classfed as beng hghly educated f at least 16

one parent has completed at least 12 years of educaton. In the pre-reform schoolng system ths generally refers to a stuaton where the parent attended the more academc track. In column (2), we add lnear regon-specfc trends to the regresson. Snce the nterest here s n the dfference of the effect of the reform between recruts from hgh- and low-educaton famles, we can control for regonal trends n a more flexble way n these regressons. In column (3), we add nteractons of brth cohorts and regonal dummes thus controllng also for any non-lnear regonal trends. After addng these nteractons the man effect of the reform on test scores s no longer dentfed, but the dfference of effect between recruts from hgh and low educated famles stll s. Fnally, n column (4), we further ntroduce the full nteractons of parental educaton wth cohort and mplementaton regon fxed effects as well as wth the test age dummes. Accordng to table 6A, parental educaton has a clear effect on test scores. Men wth hghly educated parents score, on average, 0.275 standard devatons hgher. The effect of the reform, now referrng to the effect on men wth less educated parents, s postve and larger than n table 2. However, the most remarkable result n table 6A s the negatve and sgnfcant estmate for the nteracton of parental educaton and comprehensve school reform. Accordng to the results the reform had sgnfcantly less effect on men wth more educated parents. Ths result s robust to ncludng regon-specfc lnear trends or full cohort regon nteractons although no longer sgnfcant once full nteractons are ntroduced n column (4). The pont estmates n table 6A ndcate that the reform ncreased the test scores of men from low-educaton famles by 0.031 0.047 standard devaton unts and but lttle or no effect on men from hgh-educaton famles. The results are qualtatvely smlar for ndvdual tests (not reported n table) and agan the effect s strongest n the case of verbal test scores where the reform ncreased the test scores of recruts from low educated famles by 0.06 standard devaton unts. Ths effect s szeable, amountng to a quarter of the effect of parental educaton. In table 6B we repeat the same analyss now usng parents ncome as the measure of famly background. The parents ncome s measured by summng the annual taxable ncome of both parents, nflatng the ncomes to the 2002 prce level and takng an average over the census years 1970, -75 and -80. To facltate the nterpretaton of the coeffcents, parental earnngs were normalzed by subtractng the sample mean. Ths demeanng has no mpact on the estmate of the nteracton of the reform and parental earnngs, but t makes t possble to 17

nterpret the man effect of the reform n table 6B as the effect of the reform on sons from famles wth sample mean ncome. The results are qualtatvely smlar to those usng parents educaton. Men wth rcher parents tend to have hgher average scores and the nteracton between the parents ncome and the reform dummy s negatve n all models but column (4) where t s not statstcally dfferent from zero. [Tables 6A and 6B: EFFECTS BY FAMILY BACKGROUND] 6.3 Magntude of the effects The results presented above suggest that the comprehensve school reform ncreased verbal test scores by approxmately 0.04 standard devaton unts. Ths may seem lke a small ncrease and t s therefore nstructve to put our results nto perspectve by comparng them to the effect szes found wth other educatonal reforms. The closest comparsons are the varous studes on the effect of trackng. Duflo et al. (2008) report that trackng students wthn schools n Kenya led to an average ncrease of test scores by 0.14 standard devaton unts. Comparng dfferent countres, Hanushek and Wößmann (2006) fnd no effect of early school trackng on mean test scores. Jacob and Ludwg (2008) survey the effect szes of varous educatonal polces targeted at chldren from low ncome famles n the US. The effects of these polces on test scores vary from a hgh of 0.22, related to a large reducton n class sze, to effects as low as 0.03 of polces such as Teach for Amerca. In addton, the polces surveyed by Jacob and Ludwg (2008) typcally target much younger chldren, where we should expect to see larger effects. Hence, n the lght of these results, the effects of the Fnnsh comprehensve school reform are small but not out of lne wth results from other educaton polces, especally gven that the effects here are measured on average four years after the completon of comprehensve school whle other studes tend to focus on more mmedate outcomes. Perhaps the most nterestng result reported above s the consstent postve effect of the reform on the test scores of recruts from low-educaton and low ncome famles. As ths mples a reducton n test score dfferences along socoeconomc lnes, t s nformatve to compare the test score results to the earler estmates of the effect of the comprehensve school reform on the ntergeneratonal ncome elastcty. 18

Pekkarnen et al (2009) found that the Fnnsh comprehensve school reform decreased the elastcty of sons earnngs wth respect to ther fathers earnngs by as much as 0.066 log ponts. In ths paper we report that the nteracton of the parents earnngs and the school reform s -0.036 n a comparable regresson equaton explanng the standardzed test scores (Table 6B, Column 3). Ths estmate can be scaled usng earler results on the effect of Army test scores on earnngs. For example, Uustalo (1999) calculated that a one standard devaton ncrease n these test scores mples about a sx percent ncrease n earnngs at age 33. Hanushek and Wößmann (2008) found that a one standard devaton ncrease n the Internatonal Adult Lteracy test score ncreases earnngs by about nne percent n Fnland. Multplyng -0.036 wth ether of these estmates ndcates that the effect of school reform on test scores only explans less than half a percent of the observed 6.6 percent declne n ntergeneratonal ncome elastcty. Even after allowng for generous correctons for measurement errors n the test scores, t s obvous that the effect of the school reform on measurable cogntve sklls s not suffcent to explan ts effect on the earnngs dstrbuton. 7. Conclusons Persstent dfferences n average test scores across countres and over tme have receved plenty of attenton n recent years. One often suggested explanaton for these dfferences s the educatonal system. In partcular, the trackng of pupls nto dfferent groups by ablty and aspratons has been consdered a potentally mportant factor. However, both the economc theory and the avalable emprcal evdence reman nconclusve when t comes to the effects of trackng regmes on test scores. Here, we estmate the effect of the comprehensve school reform on the Fnnsh Army Basc Sklls Test scores. Unlke prevous lterature that had to rely on cross-country comparsons or comparsons of regons wthn countres, here the effect of the comprehensve school reform on test scores s estmated usng a dfference-n-dfferences approach wth sngle-country data. As such, the current study provdes a more serous attempt at dentfyng the causal effect of school systems on test outcomes. On average, the reform had a small postve effect on the average verbal test scores and no sgnfcant postve or negatve effect on the average arthmetc or logcal reasonng test scores. Most nterestngly, however, for all of these tests, the effect of the reform was postve 19

and sgnfcant n famles where the parents had only basc educaton or low ncome, ndcatng a reducton n cogntve sklls dfferences along socoeconomc lnes. We argue that the changes n the dstrbuton of sklls are lkely to be due to the more academc currculum content and the change n peer groups that especally affected the students from less-advantaged famles. As s typcal wth any evaluaton of real-lfe polcy nterventons of ths scale, we cannot dsentangle the relatve mportance of these factors. However, most realstc changes n school trackng polces nvolve changes n both the teachng content and peer groups almost by defnton. Hence relable estmates of the overall causal effect of school trackng polces are hghly relevant both n terms of the emprcal study of educatonal polces as well as the current polcy debate on school trackng. 20

References Aakvk, A., K. G. Salvanes, and K. Vaage (2010), Measurng heterogenety n the returns to educaton n Norway usng an educatonal reform, European Economc Revew, 54(4), 483-500. Angrst, J., and J. S. Pschke (2009), Mostly Harmless Econometrcs: An Emprcst's Companon, Prnceton Unversty Press, Prnceton and Oxford. Bertrand, M., E. Duflo, and S. Mullanathan (2004), How Much Should We Trust Dfferences-n-Dfferences Estmates? Quarterly Journal of Economcs 119, 249 275. Black, S., P. Devereux, and K. G. Salvanes (2010), The more the smarter? Famly sze and IQ, Journal of Human Resources, 45(1), 33-58. Brunello, G., and D. Checch (2007), Does school trackng affect equalty of opportunty? New nternatonal evdence, Economc Polcy, Oct 2007, 781-861. Cameron, A., D. Mller, and J. B. Gelbach, (2008), Bootstrapped-based mprovements for nference wth clustered errors, Revew of Economcs and Statstcs, 90 (3), 414-427. Duflo, E., P. Dupas, and M. Kremer (2011), Peer effects, teacher ncentves, and the mpact of trackng: Evdence from a randomzed evaluaton n Kenya, The Amercan Economc Revew, 101 (5), 1739-1774. Fnnsh Defense Command (2000), The Fnnsh Defense Forces, Annual Report 2000 (n Fnnsh), Defense Command Publc Informaton Dvson, Helsnk. Galndo-Rueda, F., and A. Vgnoles (2005), The heterogeneous effect of selecton n secondary schools: Understandng the changng role of ablty, CEE Dscusson Paper 52. Hanushek, E., and L. Wößmann (2006), Does educatonal trackng affect performance and nequalty? Dfferences-n-dfferences evdence across countres, Economc Journal 116, C63-C76. Hanushek, E., and L. Wößmann (2008), The role of cogntve sklls n economc development, Journal of Economc Lterature, 46 (3), 607-668. Jacob, B., and J. Ludwg, (2008), Improvng educatonal outcomes for poor chldren, NBER Workng Paper, 14550. 21

Kerckhoff, A., K. Fogelman, D. Crook, and D. Reeder (1996), Gong Comprehensve n England and Wales. A Study of Uneven Change, London: The Woburn Press. Kovunen, S. (2007), Suomalasmesten kogntvsen kykyprofln muutokset 1988-2001. Flynnn efektä suomalasessa anestossa? [Changes n cogntve skll profle among Fnnsh men. Flynn effect n Fnnsh data?], Master s thess (n Fnnsh), Unversty of Jyväskylä. Krstensen P., and T. Bjerkedal (2007), Explanng the Relaton Between Brth Order and Intellgence, Scence 316 (5832), 1717. Lazear, E. (2001), Educatonal Producton, Quarterly Journal of Economcs 116 (3), 777-803. Lndqvst, E., and R. Vestman (2011), The labor market returns to cogntve and noncogntve ablty: Evdence from Swedsh enlstment, Amercan Economc Journal: Appled Economcs, 3 (1), 101-128. Mannng, A., and J. Pschke (2006), Comprehensve versus selectve schoolng n England n Wales: What do we know? NBER Workng Paper No. 12176. Meghr, C., and M. Palme (2005), Educatonal reform, ablty, and parental background, Amercan Economc Revew 95 (1), 414-424. OECD (2003), Learnng for Tomorrow s World: Frst Results from PISA 2003, OECD, Pars. Pekkarnen, T., R. Uustalo, and S. Pekkala (2009), School trackng and ntergeneratonal ncome moblty: Evdence from the Fnnsh comprehensve school reform, Journal of Publc Economcs 93, 965-973. Statstcs Fnland (1986), Structure of populaton and vtal statstcs, Offcal Statstcs of Fnland VI A: 150, Central Statstcal Offce of Fnland, Helsnk 1986. Thonen, J., J. Haukka, M. Henrksson, M. Cannon, T. Keseppä, I. Laaksonen, J. Snvuo, and J. Lönnqvst (2005), Premorbd ntellectual functonng n bpolar dsorder and schzophrena: Results from a cohort study of male conscrpts, Amercan Journal of Psychatry, 162, 1904-1910. 22

Uustalo, R. (1999), Return to educaton n Fnland, Labour Economcs, 6, 569-580. Waldnger, F. (2006), Does trackng affect the mportance of famly background on students' test scores? mmeo, London School of Economcs. Wolfers, J. (2006), Dd unlateral dvorce laws rase dvorce rates? A reconclaton and new results, The Amercan Economc Revew, 96 (5), 1802-1820. Wooldrdge, J. M. (2002), Econometrc Analyss of Cross Secton and Panel Data. Cambrdge, MA: MIT Press. 23

Fgure 1 Fnnsh school systems before and after the comprehensve school reform Unversty Unversty 18 Upper 17 secondary 16 school Vocatonal school Upper secondary school Vocatonal school 15 14 General Cvc school 13 secondary school 12 Comprehensve school 11 10 9 8 7 Prmary school Age Before reform After reform 24

Fgure 2 The mplementaton of the comprehensve school reform across regons 1972-1977 25

Fgure 3 Dstrbuton of the test scores Frequency 0 4000 8000 12000 0 10 20 30 40 Math test; mean = 18.8 sd = 8.04 Frequency 0 4000 8000 12000 0 10 20 30 40 Verbal test; mean = 22.5 sd = 7.49 Frequency 0 4000 8000 12000 0 10 20 30 40 Logcal reasong; mean = 23.8 sd = 5.86 Frequency 0 4000 8000 12000 0 10 20 30 40 Average score; mean = 21.7 sd = 6.3 26

Fgure 4 The effect of the school reform on the test score average, estmaton around the reform date Effect on Test Score Average -.1 -.05 0.05.1.15.2.25 The Effect of School Reform on Test Score Average Leads and Lags Ref-3 Ref-2 Ref-1 Reform Ref+1 Ref+2 Ref+3 Ref+4 Tme Relatve to Reform + 95% /- 95% Confdence Interval Pont Estmate Notes: The estmates n ths graph represent a regresson of the average test score on the cohort and mplementaton regon dummes, age-at-test dummes, as well as separate dummes for the leads (up to 4 years after reform) and lags (up to 3 years before reform) of the reform mplementaton. The omtted category s Reform-1. The plotted ponts are the estmates on the lead and lag dummes, and 95% confdence ntervals are shown around the pont estmates. Standard errors are clustered at the mplementaton regon level and the crtcal values from a t-dstrbuton wth 4 degrees of freedom are used for nference. 27

Table 1 Standardzed average test score by mplementaton regon and brth cohort Regon (Year when comprehensve school reform was mplemented) Brth cohort 1972 1973 1974 1975 1976 1977 Total 1962-0.21-0.22-0.16-0.10-0.01 0.19-0.09 [2,634] [3,895] [5,693] [5,468] [5,668] [3,019] [26,377] 1963-0.10-0.13-0.07-0.02 0.09 0.24 0.00 [2,896] [4,339] [6,346] [6,468] [6,496] [3,694] [30,239] 1964-0.11-0.16-0.06-0.01 0.06 0.23-0.01 [2,865] [4,299] [6,238] [6,483] [6,723] [3,977] [30,585] 1965-0.08-0.11-0.06-0.00 0.11 0.22 0.02 [2,715] [4,036] [5,995] [6,304] [6,290] [3,889] [29,229] 1966-0.01-0.04 0.01 0.03 0.14 0.29 0.07 [2,185] [3,314] [5,117] [5,579] [5,550] [3,590] [25,335] Total -0.11-0.14-0.07-0.02 0.08 0.24 0.00 [13,295] [19,883] [29,389] [30,302] [30,727] [18,169] [141,765] Parental educaton 0.48 0.46 0.50 0.51 0.59 0.69 0.53 Parental ncome 13,994 13,753 14,391 14,883 15,770 23,393 15,747 Notes: Table reports the average scores n the Fnnsh Army Basc Sklls test data by regon and cohort. Regons are defned by the year when the comprehensve reform took place n the regon (see Fgure 2). The unweghted average of the three subtest scores (math, verbal and logcal reasonng) s standardzed so that the sample mean s zero and the standard devaton s one. The number of observatons n each cell s reported n square brackets, below the mean score. The shaded areas ndcate cohorts that were affected by the post-reform educatonal system. The last two rows of the table report the share of parents wth at least 12 years of educaton and average ncome of parents from the 1970, -75 and -80 census data nflated to 2002 euros usng the consumer prce ndex. 28

Table 2 The effect of the school reform on the test score average (1) (2) (3) (4) Baselne Regon & cohort Regonal trends Famly fxed effects Reformed school -0.095 0.010 0.025 0.023 [-0.260,0.070] [-0.017,0.038] [-0.009,0.058] [-0.023,0.069] Brth year 1963 0.027 0.034-0.009 [0.001,0.053] [0.003,0.064] [-0.048,0.031] Brth year 1964 0.030 0.043-0.020 [0.002,0.057] [0.000,0.086] [-0.074,0.033] Brth year 1965 0.061 0.082-0.026 [0.031,0.092] [0.024,0.141] [-0.098,0.046] Brth year 1966 0.078 0.108-0.046 [0.044,0.112] [0.032,0.183] [-0.138,0.047] Reform regon 1973-0.034-0.034 [-0.066,-0.001] [-0.067,-0.001] Reform regon 1974 0.043 0.045 [0.012,0.074] [0.014,0.077] Reform regon 1975 0.081 0.087 [0.049,0.114] [0.053,0.120] Reform regon 1976 0.193 0.201 [0.158,0.228] [0.165,0.238] Reform regon1977 0.343 0.355 Ln. trend x regon 1973 Ln. trend x regon 1974 Ln. trend x regon1975 Ln. trend x regon 1976 Ln. trend x regon1977 [0.302,0.383] [0.311,0.398] -0.002-0.017 [-0.026,0.022] [-0.046,0.012] -0.008 0.002 [-0.031,0.015] [-0.027,0.030] -0.019-0.027 [-0.043,0.006] [-0.058,0.005] -0.010-0.023 [-0.035,0.014] [-0.054,0.009] -0.016-0.038 [-0.041,0.009] [-0.071,-0.004] Constant 4.557 4.343 4.315 3.904 R-squared 0.067 0.074 0.074 0.048 Observatons 141,765 141,765 141,765 141,765 Notes: Sample ncludes brth cohorts 1962-66. The dependent varable s the unweghted average of the three subtest scores (math, verbal and logcal reasonng) scaled nto standard devaton unts. All regressons nclude 13 age-at-test-dummes. Column 2 adds brth cohort and mplementaton regon fxed effects. Column 3 further adds regon-specfc lnear trends. Column 4 s estmated wth cohort fxed effects, regonal trends, age-at-test dummes and famly fxed effects. Standard errors are clustered at the mplementaton regon level and the crtcal values from a t-dstrbuton wth 4 degrees of freedom are used for nference. As non-standard crtcal values are used, the 95% confdence ntervals are reported n parentheses below the pont estmates. 29

Table 3 The effect of the school reform n dfferent tests (1) (2) (3) Regon & cohort Regonal trends Famly fxed effects Math test 0.002 0.015 0.011 [-0.025,0.030] [-0.019,0.048] [-0.037,0.058] Verbal test 0.023 0.043 0.030 [-0.004,0.051] [0.009,0.077] [-0.018,0.078] Logcal reasonng 0.006 0.011 0.027 [-0.022,0.033] [-0.023,0.045] [-0.025,0.078] Notes: Sample ncludes brth cohorts 1962-66, n=141,765. Each cell of the table corresponds to a separate regresson. Each subtest score s scaled nto standard devaton unts. The entres n the table represent the coeffcents of a dummy varable ndcatng that the person attended the reformed comprehensve school. Column 1 ncludes cohort and mplementaton regon fxed effects, and age-at-test dummes. Column 2 adds regonal trends. Column 3 s estmated wth cohort dummes, regonal trends, age-at-test dummes and famly fxed effects. Standard errors are clustered at the mplementaton regon level and the crtcal values from a t-dstrbuton wth 4 degrees of freedom are used for nference. The 95% confdence ntervals are reported n parentheses below the pont estmates. 30

Table 4 The effect of the school reform on the mean and the varance of the test scores OLS estmates (1) (2) (3) (4) Math test Verbal test Logcal reasonng test Average score n all 3 tests Effect on the mean 0.015 0.043 0.011 0.025 ML estmates [-0.019,0.048] [0.009,0.077] [-0.023,0.045] [-0.009,0.058] Effect on the mean 0.015 0.042 0.013 0.025 [-0.019,0.049] [0.008,0.076] [-0.021,0.046] [-0.009,0.058] Effect on the log -0.007-0.025-0.025-0.024 Varance [-0.056,0.043] [-0.074,0.024] [-0.074,0.025] [-0.073,0.026] Notes: Sample ncludes brth cohorts 1962-66, n=141,765. The entres n the table represent the coeffcents of a dummy varable ndcatng that the person attended the reformed comprehensve school. Each regresson model ncludes cohort and mplementaton regon fxed effects, regonal trends, and age-at-test dummes. The 95% confdence ntervals are reported n parentheses below the pont estmates 31

Table 5 The effect of the school reform n dfferent tests allowng for leads and lags (1) Math test (2) Verbal test (3) Logcal reasonng test (4) Average score n all 3 tests Reform - 3 years -0.003-0.032-0.034-0.023 [-0.051,0.046] [-0.080,0.017] [-0.083,0.015] [-0.071,0.025] Reform - 2 years 0.006 0.005-0.001 0.005 [-0.027,0.040] [-0.028,0.039] [-0.035,0.032] [-0.028,0.038] Reform 0.016 0.055 0.020 0.032 [-0.021,0.053] [0.017,0.092] [-0.017,0.058] [-0.004,0.069] Reform + 1 year 0.012 0.054 0.044 0.038 [-0.041,0.064] [0.000,0.107] [-0.010,0.097] [-0.015,0.090] Reform + 2 years 0.003 0.070 0.045 0.040 [-0.069,0.075] [-0.002,0.143] [-0.028,0.118] [-0.032,0.112] Reform + 3 years 0.024 0.098 0.083 0.071 [-0.070,0.117] [0.004,0.193] [-0.012,0.177] [-0.022,0.165] Reform + 4 years 0.050 0.143 0.082 0.099 [-0.072,0.171] [0.020,0.265] [-0.041,0.205] [-0.023,0.220] Constant 3.120 2.856 4.579 4.305 Observatons 141,814 142,049 142,084 141,765 Notes: Sample ncludes brth cohorts 1962-66. Each column corresponds to a separate regresson. Rows report the estmated coeffcents of dummes for 3 and 2 years pror to the reform, as well as, for the mmedate effect of the reform and 1, 2, 3, and 4 years after the reform. The omtted category s reform year 1. Each regresson ncludes cohort and mplementaton regon fxed effects, and age-at-test dummes. Standard errors are clustered at the mplementaton regon level and the crtcal values from a t-dstrbuton wth 4 degrees of freedom are used for nference. The 95% confdence ntervals are reported n parentheses below the pont estmates 32

Table 6A Effect of the school reform on average test score by parents educaton Hgh educated parents (1) Cohort and regon dummes (2) Regon specfc lnear trends (3) Cohort x regon nteractons (4) Full nteractons wth parental educaton 0.275 0.275 0.276 0.187 [0.248,0.303] [0.247,0.303] [0.248,0.304] [0.045,0.329] Reform 0.031 0.047 [-0.003,0.065] [0.007,0.086] Reform -0.035-0.035-0.036-0.031 hgh educated [-0.070,-0.001] [-0.069,-0.000] [-0.071,-0.002] [-0.089,0.027] parents Constant 2.270 2.237 2.920 2.962 Observatons 126,977 126,977 126,977 126,977 R-squared 0.092 0.092 0.092 0.092 Table 6B Effect of the school reform on average test score by parents ncome (1) Cohort and regon dummes (2) Regon specfc lnear trends (3) Cohort x regon nteractons (4) Full nteractons wth parental ncome Parents ncome 0.325 0.324 0.327 0.246 [0.299,0.352] [0.298,0.351] [0.300,0.354] [0.109,0.384] Reform 0.014 0.029 [-0.015,0.042] [-0.006,0.064] Reform -0.034-0.033-0.036 0.002 parents ncome [-0.066,-0.002] [-0.065,0.000] [-0.069,-0.003] [-0.053,0.058] Constant 4.283 4.256 3.075 3.080 Observatons 126,891 126,891 126,891 126,891 R-squared 0.101 0.101 0.102 0.102 Notes: Sample ncludes brth cohorts 1962-66. Each column corresponds to a separate regresson. The dependent varable s the unweghted average of the three subtest scores (math, verbal and logcal reasonng) scaled nto standard devaton unts. Parents educaton s a dummy varable ndcatng that at least one parent had a degree hgher than compulsory educaton. Parents ncome s the log of average annual taxable ncome of parents from the 1970, -75 and -80 census data nflated to the 2002 prce level usng the consumer prce ndex. Parents ncome s measured as devaton from the mean log parents ncome s used so the reform effect can be nterpreted as the effect at the mean ncome level. Column 1 ncludes cohort and mplementaton regon fxed effects, and age-at-test dummes. Column 2 adds lnear regonal trends. Column 3 adds dummes for the nteractons between brth cohort and mplementaton regon. Column 4 further adds the nteractons of parental educaton (or ncome) wth 1) brth cohort dummes, 2) mplementaton regon dummes and 3) age-at-test dummes. Standard errors are clustered at the mplementaton regon level and the crtcal values from a t-dstrbuton wth 4 degrees of freedom are used for nference. The 95% confdence ntervals are reported n parentheses below the pont estmates. 33

Appendx 1. Subsecton scores Table A1: Subsecton scores n the Army test by cohort and regon Math score Reform year Brth cohort 1972 1973 1974 1975 1976 1977 Total 1962-0.20-0.19-0.14-0.09 0.01 0.15-0.08 1963-0.11-0.10-0.06-0.00 0.08 0.22 0.01 1964-0.12-0.14-0.06-0.01 0.07 0.21-0.01 1965-0.08-0.11-0.06 0.00 0.09 0.21 0.01 1966-0.01-0.02 0.01 0.04 0.11 0.26 0.07 Total -0.11-0.12-0.06-0.01 0.07 0.21 0.00 Verbal score Reform year Brth cohort 1972 1973 1974 1975 1976 1977 Total 1962-0.18-0.19-0.12-0.10-0.01 0.16-0.08 1963-0.06-0.13-0.04-0.03 0.08 0.19 0.00 1964-0.10-0.14-0.04-0.00 0.05 0.19-0.00 1965-0.05-0.10-0.03-0.01 0.11 0.16 0.02 1966 0.01-0.04 0.01 0.02 0.11 0.24 0.06 Total -0.08-0.12-0.04-0.02 0.07 0.19-0.00 Logcal reasonng score Reform year Brth cohort 1972 1973 1974 1975 1976 1977 Total 1962-0.18-0.21-0.16-0.08-0.02 0.20-0.08 1963-0.08-0.12-0.10-0.01 0.06 0.24-0.01 1964-0.08-0.15-0.07-0.02 0.06 0.23-0.00 1965-0.08-0.09-0.05 0.00 0.09 0.22 0.02 1966-0.02-0/04 0.01 0.03 0.15 0.30 0.08 Total -0.09-0.13-0.08-0.02 0.07 0.24-0.00 34

Appendx 2. Issues related to the standard errors of the estmates In the man parts of the paper we report standard errors based on the Moulton procedure and use for statstcal nference a t-dstrbuton wth G-2 degrees of freedom, where G s the number of clusters (6 n our case). Clusterng s done at the regon, not at the regon/cohort level. We also make a small sample adjustment suggested by Cameron, Mller and Gelbach (2008) nflatng the resduals by G /( G 1). Snce we use non-standard crtcal values we report conddence ntervals nstead of standard errors below the estmates. After correctng the standard errors for clusterng, the remanng concern s whether the cluster correcton works gven the small number of clusters and the potental seral correcton of the errors. To assess the severty of the problem we followed the example of Bertrand, Duflo and Mullanathan (2004) and calculated mean resduals from equaton 1 by cohort and regon. We then estmated autocorrelatons by an OLS regresson of mean resduals on the lagged mean resduals. The resultng estmates for frst order autocorrelatons were low:.03 for the average score,.11 for math, -.09 for verbal and.07 for the logcal reasonng test. These estmates may naturally be downward based due to the short tme seres. However, testng the null of no autocorrelaton by regressng the frst-dfferenced resduals on ther lagged values, as suggested by Wooldrdge (2002), ndcates no sgnfcant frst order autocorrelaton. In any case, even somewhat larger autocorrelaton coeffcents (e.g. around.2) only lead to modest over-rejecton rates n the Monte Carlo smulatons of Bertrand et al. (2004), Table 2. For the sake of completeness we also calculated bootstrapped standard errors usng the procedures descrbed n Cameron et al. (2008). As the data are clustered by regon, we used the block bootstrap and wld bootstrap that retan the cluster structure n the data and can be used wth unequal cluster szes. We dd ths bootstrappng the t-statstc, as recommended by Cameron et al (2008). In the tables below, we compare the confdence ntervals obtaned from the varous estmaton procedures for the key parameters of our paper. We report the confdence ntervals based on OLS, Stata cluster correcton (wth Stata default opton that uses crtcal values from a t-dstrbuton wth G- 1 degrees of freedom), and Moulton correcton (wth the small sample correcton and crtcal values from a t-dstrbuton wth G-2 degrees of freedom). Fnally, we also report bootstrapped estmates wth 200 replcatons usng the block bootstrap-t and the wld bootstrap-t. Our concluson from these experments s that the OLS confdence ntervals are clearly too narrow n the frst column of Table A2 when no fxed effects are ncluded. Other methods produce results that are qualtatvely smlar to each other, although the bootstrapped confdence ntervals are somewhat wder. Once the fxed effects are ncluded (Column 2), the confdence ntervals produced by dfferent 35

cluster correcton procedures are qute smlar, and even OLS confdence ntervals appear to be qute accurate. Our nterpretaton s that ths s manly due to the low degree of seral correlaton n the data. Confdence ntervals based on the Moulton procedure appear to be the most conservatve choce. Table A2: Confdence ntervals for Table 2, effect of the reform on average score Column 1 (b = -.095) Column 2 (b =.010) Column 3 (b =.025) Method Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI OLS -.106 -.085 -.008.028.003.046 Cluster corrected -.258.068 -.008.028.010.040 Moulton -.260.070 -.017.038 -.009.058 Block bootsrap-t Wld bootsrap-t -.228.136 -.017.022 -.022.089 -.329.138 -.007.027.011.039 For the other tables the conclusons from our expermentaton are very smlar. Confdence ntervals based on the OLS standard errors are typcally only slghtly narrower than the confdence ntervals based on the parametrc cluster correctons. Also, the wld bootstrap-t advocated by Cameron et al. (2008) produces confdence ntervals that are close to the confdence ntervals based on the usual cluster correcton methods. The only method producng qualtatvely dfferent results s the block bootstrap-t that n some cases generates extremely wde confdence ntervals. Ths fndng resembles closely the Monte Carlo results of Cameron et al. (p.423), who note that the block bootstrap severely under-rejects the null when there are only a few clusters. Ther explanaton s that n some bootstrap replcatons only one treatment or control regon s sampled, and for these replcatons the treatment dummy produces a perfect ft and zero resduals. The resultng Wald statstcs n these re-samples are very large, whch results n severe under-rejecton rates. Our fndngs are smlar. Wald estmates n some resamples are very large and hence the bootstrap confdence ntervals very wde. Wth the excepton of the block bootstrap-t our experments wth bootstrappng ndcate that the estmated confdence ntervals are rather robust to the choce of cluster correcton method. Agan ths s most lkely due to the low degree of seral correlaton n the data. 36

Table A3: Confdence ntervals for Table 3, effect of the reform n the subtest scores Wthout regonal trends (Column 1) Method Math (b =.002) Verbal (b =.023) Logcal (b =.006) Upper 95% Lower Upper 95% Lower CI 95% CI CI 95% CI Lower 95% CI Upper 95% CI OLS -.016.020.006.041 -.012.024 Cluster -.011.015 -.002.049 -.021.032 Moulton -.025.030 -.004.051 -.022.033 Beta_block_t -.020.019 -.071.043 -.041.027 Beta_wld_t -.010.015 -.007.052 -.018.053 Wth regonal trends (Column 2) Math (b =.015) Verbal (b =.043) Logcal (b =.011) Method Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI OLS -0.007.036.021.065 -.011.033 Cluster 0.002.027.024.062 -.017.039 Moulton -0.019.048.009.077 -.023.045 Beta_block_t -0.021.027.000.077 -.137.079 Beta_wld_t 0.005.024.023.064 -.028.050 Table A4: Confdence ntervals for Table 6A, nteracton of reform and parents educaton Method Column 1 (b = -.035) Column 2 (b = -.035) Column 3 (b = -.036) Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI OLS -.057 -.013 -.057 -.013 -.058 -.014 Cluster -.072.001 -.070.000 -.073.001 Moulton -.070 -.001 -.070.000 -.071 -.002 Beta_block_t -.074.097 -.068.078 -.072.094 Beta_wld_t -.074.004 -.071.001 -.076.004 Table A5: Confdence ntervals for Table 6B, nteracton of reform and parents ncome Column 1 (b = -.034) Column 2 (b = -.033) Column 3 (b = -.036) Method Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI Lower 95% CI Upper 95% CI OLS -.054 -.013 -.053 -.012 -.057 -.015 Cluster -.091.024 -.090.025 -.099.026 Moulton -.066 -.002 -.065.000 -.069 -.003 Beta_block_t -.081.064 -.081.062 -.093.081 Beta_wld_t -.097.030 -.091.025 -.103.031 37

Appendx 3 Subsample results The data used n the man parts of ths paper nclude all conscrpts who were born between 1962 and 1966 and who were found n the Army database. The army test scores are n the regster from 1982 onwards. As t s possble to enter to mltary servce as a volunteer before age 20, some men n the oldest cohorts served before the Army regster was created. Ths generates some selectvty n the data. In the fnal verson of the paper we used the full sample and controlled for age at test. Controllng for age at test was done because age mght have an ndependent effect on the test scores and because age of enterng nto the army s correlated wth educaton. Volunteerng nto the army at age 19 s often convenent for hgh school graduates. Also educaton s one of the most common reasons for applyng for deferment. In the data, the average test scores are hgher for those who take the test before or after age 20 compared to those who take the test at age 20. However, controllng for age at test s potentally a problematc soluton f age at test s correlated wth educaton and the comprehensve school reform affects the length of educaton, thereby makng also age at test an endogeneous varable. Controllng for an endogenous varable or lmtng the sample based on such varable would generally lead nto based results. As a potental soluton for the selectvty problem we restrcted the data to men born between 1964 and 1966,.e. those whose test scores we can observe even f they volunteered for early servce at age 18 or 19. In order to balance the sze of the treatment and the control groups n ths lmted sample the regons where the reform was mplemented between 1972 and 1974 were also excluded from the analyss. The latter restrcton also lessened the problem caused by unknown treatment status of persons who lved n dfferent muncpalty n 1970 than n 1975. Such dfferences were relatvely frequent because of mgraton and muncpalty mergers. The nternal mgraton rates peaked n early 1970 s and were about 25% hgher n 1971-75 compared to 1976-80. In addton, altogether 71 of 518 muncpaltes that had exsted n 1970 merged wth ther neghbourng muncpaltes between 1971 and 1975. Restructurng of local governments was much less ntensve n late 1970s and hence the muncpalty codes that were used to determne the treatment status were more stable between 1975 and 1980. In Tables A6, A7 and A8 we report the results from the restrcted sample. The estmates are generally slghtly larger than those presented n the man parts of the paper but qualtatvely the restrctons made lttle dfference. Eventually, these estmates were dropped from the verson submtted for publcaton manly because we calculated standard errors clustered by regon and these restrctons would have resulted to even lower number of clusters. 38

Table A6: Reform effects usng cohorts 1964-1966 (1) (2) (3) (4) VARIABLES Math Verbal Logcal Average score Reasonng Reform 0.0256 0.0788*** 0.0264 0.0479* (0.0248) (0.0247) (0.0247) (0.0247) Constant 2.548*** 3.238*** 4.109*** 3.434*** Observatons 48,397 48,470 48,476 48,385 Notes: In all columns the model ncludes a full set of dummy varables for regon and cohort and a set of dummy varables for age on the test date. Regressons also control for lnear regonal trends n test scores. Standard errors are clustered at the regon level. Table A7: Effect of the reform on average test score by parents educaton, cohorts 1964-1966 (1) (2) (3) Cohort x regon nteractons Regon-specfc lnear trends Full nteractons wth parental educaton Hgh ed. parents 0.315*** 0.316*** 0.0644 (0.0178) (0.0178) (0.0717) Reform 0.122*** (0.0286) Reform -0.0713*** -0.0727*** 0.0156 hgh ed. parents (0.0209) (0.0210) (0.0353) Constant 3.083 3.164*** 3.104*** Observatons 42,991 42,991 42,991 Table A8: Effect of the reform on average test score by parents ncome, cohorts 1964-1966 (1) (2) (3) Cohort x regon nteractons Regon-specfc lnear trends Full nteractons wth parental ncome Parental ncome 0.345*** 0.347*** 0.116* (0.0182) (0.0183) (0.0662) Reform 0.0734*** (0.0255) Reform -0.0366* -0.0398* 0.0692* parental ncome (0.0212) (0.0214) (0.0354) Constant 3.543*** 3.249*** 3.245*** Observatons 3.543*** 3.249*** 3.245*** Notes: Parents ncome s the average log ncome of parents from the 1970, -75 and -80 census data nflated to the 1980 prce-level usng the consumer prce ndex. Devaton from the mean log parents ncome was used so that the effect of the reform can be nterpreted as the effect at the mean ncome level. 39