Reconstruction of Upper-Level Temperature and Geopotential Height Fields for the Northern Extratropics back to 1920



Similar documents
USING SIMULATED WIND DATA FROM A MESOSCALE MODEL IN MCP. M. Taylor J. Freedman K. Waight M. Brower

Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques

SPATIAL DISTRIBUTION OF NORTHERN HEMISPHERE WINTER TEMPERATURES OVER THE SOLAR CYCLE DURING THE LAST 130 YEARS

FURTHER DISCUSSION ON: TREE-RING TEMPERATURE RECONSTRUCTIONS FOR THE PAST MILLENNIUM

Impact of dataset choice on calculations of the short-term cloud feedback

Near Real Time Blended Surface Winds

Developing Continuous SCM/CRM Forcing Using NWP Products Constrained by ARM Observations

Validation of ECMWF and NCEP NCAR Reanalysis Data in Antarctica

IMPACTS OF IN SITU AND ADDITIONAL SATELLITE DATA ON THE ACCURACY OF A SEA-SURFACE TEMPERATURE ANALYSIS FOR CLIMATE

Benefits accruing from GRUAN

Daily High-resolution Blended Analyses for Sea Surface Temperature

An Analysis of the Rossby Wave Theory

Estimation of satellite observations bias correction for limited area model

SCIENCE AND TECHNOLOGY INFUSION CLIMATE BULLETIN

Huai-Min Zhang & NOAAGlobalTemp Team

Radiative effects of clouds, ice sheet and sea ice in the Antarctic

Examining the Recent Pause in Global Warming

Application of Numerical Weather Prediction Models for Drought Monitoring. Gregor Gregorič Jožef Roškar Environmental Agency of Slovenia

Jessica Blunden, Ph.D., Scientist, ERT Inc., Climate Monitoring Branch, NOAA s National Climatic Data Center

Temporal and spatial evolution of the Antarctic sea ice prior to the September 2012 record maximum extent

Heavy Rainfall from Hurricane Connie August 1955 By Michael Kozar and Richard Grumm National Weather Service, State College, PA 16803

Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:

Developing sub-domain verification methods based on Geographic Information System (GIS) tools

4.3. David E. Rudack*, Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA 1.

Chapter 3: Weather Map. Weather Maps. The Station Model. Weather Map on 7/7/2005 4/29/2011

Virtual Met Mast verification report:

2008 Global Surface Temperature in GISS Analysis

Atmospheric Processes

Decadal predictions using the higher resolution HiGEM climate model Len Shaffrey, National Centre for Atmospheric Science, University of Reading

Supporting Online Material for


Synoptic assessment of AMV errors

Climate change and heating/cooling degree days in Freiburg

Climate Extremes Research: Recent Findings and New Direc8ons

Evaluation of the Effect of Upper-Level Cirrus Clouds on Satellite Retrievals of Low-Level Cloud Droplet Effective Radius

Graphing Sea Ice Extent in the Arctic and Antarctic

Mode-S Enhanced Surveillance derived observations from multiple Air Traffic Control Radars and the impact in hourly HIRLAM

Fundamentals of Climate Change (PCC 587): Water Vapor

Atmospheric Dynamics of Venus and Earth. Institute of Geophysics and Planetary Physics UCLA 2 Lawrence Livermore National Laboratory

REGIONAL CLIMATE AND DOWNSCALING

Interactive comment on Total cloud cover from satellite observations and climate models by P. Probst et al.

Chapter 3: Weather Map. Station Model and Weather Maps Pressure as a Vertical Coordinate Constant Pressure Maps Cross Sections

THE SEARCH FOR T RENDS IN A GLOBAL CATALOGUE

Improved diagnosis of low-level cloud from MSG SEVIRI data for assimilation into Met Office limited area models

Observed Cloud Cover Trends and Global Climate Change. Joel Norris Scripps Institution of Oceanography

DIURNAL CYCLE OF CLOUD SYSTEM MIGRATION OVER SUMATERA ISLAND

UCAR Trustee Candidate Kenneth Bowman

Breeding and predictability in coupled Lorenz models. E. Kalnay, M. Peña, S.-C. Yang and M. Cai

Towards an NWP-testbed

HFIP Web Support and Display and Diagnostic System Development

Diurnal Cycle of Convection at the ARM SGP Site: Role of Large-Scale Forcing, Surface Fluxes, and Convective Inhibition

Comparing TIGGE multi-model forecasts with. reforecast-calibrated ECMWF ensemble forecasts

USING THE GOES 3.9 µm SHORTWAVE INFRARED CHANNEL TO TRACK LOW-LEVEL CLOUD-DRIFT WINDS ABSTRACT

ENSO Cycle: Recent Evolution, Current Status and Predictions. Update prepared by Climate Prediction Center / NCEP 9 May 2011

Very High Resolution Arctic System Reanalysis for

Cloud detection and clearing for the MOPITT instrument

A SURVEY OF CLOUD COVER OVER MĂGURELE, ROMANIA, USING CEILOMETER AND SATELLITE DATA

South Africa. General Climate. UNDP Climate Change Country Profiles. A. Karmalkar 1, C. McSweeney 1, M. New 1,2 and G. Lizcano 1

Future needs of remote sensing science in Antarctica and the Southern Ocean: A report to support the Horizon Scan activity of COMNAP and SCAR

Lecture 4: Pressure and Wind

Climate, water and renewable energy in the Nordic countries

Cloud Thickness Estimation from GOES-8 Satellite Data Over the ARM-SGP Site

How To Predict Climate Change

Comparative Evaluation of High Resolution Numerical Weather Prediction Models COSMO-WRF

An exploratory study of the possibilities of analog postprocessing

Geography affects climate.

ENSO: Recent Evolution, Current Status and Predictions. Update prepared by: Climate Prediction Center / NCEP 29 June 2015

The Next Generation Flux Analysis: Adding Clear-Sky LW and LW Cloud Effects, Cloud Optical Depths, and Improved Sky Cover Estimates

Simulation of low clouds from the CAM and the regional WRF with multiple nested resolutions

Monsoon Variability and Extreme Weather Events

How To Model An Ac Cloud

Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product

LANDSAT 8 Level 1 Product Performance

Simple Linear Regression Inference

CALCULATION OF CLOUD MOTION WIND WITH GMS-5 IMAGES IN CHINA. Satellite Meteorological Center Beijing , China ABSTRACT

Predicting daily incoming solar energy from weather data

The correlation coefficient

Forecaster comments to the ORTECH Report

COMPUTING CLOUD MOTION USING A CORRELATION RELAXATION ALGORITHM Improving Estimation by Exploiting Problem Knowledge Q. X. WU

David P. Ruth* Meteorological Development Laboratory Office of Science and Technology National Weather Service, NOAA Silver Spring, Maryland

Transcription:

Reconstruction of Upper-Level Temperature and Geopotential Height Fields for the Northern Extratropics back to 1920 Technical report Thomas Griesser, Stefan Brönnimann, Andrea Grant, Tracy Ewen, Alexander Stickler Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland Institute for Atmospheric and Climate Science ETH Zurich Universitätstr. 16 CH-8092 Zurich Switzerland 8 June 2008

Abstract We present reconstructions of upper-level GPH and temperature up to 100 hpa for the northern extratropics. The reconstructions are based on a large amount of historical upperair data as well as information from the Earth s surface. They cover the period 1920-1957. ERA-40 reanalysis is used to calibrate the statistical models that are used in the reconstruction process. This report describes the data used for the reconstruction, the reconstruction method as well as validation experiments. Validations were performed within the calibration period (split-sample validations) as well as in the reconstruction period by using independent historical upper-air data. The validation results suggest an excellent skill for GPH (up to 100 hpa) for the winter season. The skill is slightly worse for lower to mid-tropospheric temperature in winter and is lower for GPH in summer. Care should be taken when analyzing the temperature for the summer season, when the skill is relatively low. The data are made available to the public. The reconstructed fields as well as accompanying fields of the Reduction of Error (RE) can be downloaded from: http://www.iac.ethz.ch/en/climatology/reconstructions.html 2

1. Introduction For the study of interannual-to-decadal climate variability in the 20 th and the late 19 th century a variety of global gridded datasets of different variables on a monthly or daily base at the surface are available (e.g., HadSLP2, Allan and Ansell 2006; CRU TS 2.1, Mitchell and Jones 2005; HadCRUT3, Brohan et al. 2006; HadISST, Rayner et al. 2003; ERSST, Smith and Reynolds 2004). Although, many phenomena can be addressed to some extent based on surface data, the analysis remains limited as long as no upper-air data are available. Datasets consisting of direct upper-air measurements, like radiosondes or pilot balloons, exist for the second half of the 20 th century (e.g., CARDS, Eskridge et al. 1995, see also Lanzante et al. 2003; HadRT/HadAT, Parker et al. 1997; Thorne et al., 2005; IGRA, Durre et al. 2006). Parts of these datasets, supplemented with additional information from the surface and satellites, were assimilated into weather prediction models to generate the probably most widely used 3D datasets: ERA-40 and NCEP/NCAR Reanalysis (Uppala et al. 2005, Kistler et al. 2001). Although, the continuously updated NCEP/NCAR reanalysis provides now data for the past 60 years (1948-2008), there is still a general lack of knowledge about the variability of the upper-level circulation in earlier times and on an interdecadal timescale. Several authors have tried to complete the image of the upper-level circulation for the first half of the 20 th century. Klein and Dai (1998) presented a method to statistically reconstruct 700 hpa geopotential heights (GPH) for North America, extending to the Pacific and Atlantic, using surface air temperature (SAT) and sea level pressure (SLP) data. Schmutz et al. (2001) reconstructed 700, 500 and 300 hpa GPH for the European and Eastern North Atlantic region based on SLP, SAT and precipitation (RR) data. Gong et al. (2006) derived 500 hpa GPH for the Northern Hemisphere based on SLP and SAT fields. All the presented studies have one major shortcoming. They do not include any upper-air measurements. Brönnimann and Luterbacher (2004) pointed to the availability of upper-air measurements before 1948. They reconstructed 700, 500, 300 and 100 hpa GPH and temperature using ground and newly available upper-air measurements. In this paper we extend and refine this approach and present statistical reconstructions of GPH and temperature for the extratropical Northern Hemisphere (15 N-90 N). The dataset consists of monthly reconstructions on the levels 850, 700, 500, 300, 200 and 100 hpa for the period 1920-1957, in order to allow a seamless connection to the ERA-40 reanalysis. 2. Data For the reconstruction we define two major time periods. On the one hand the calibration/validation period (1957-2002) and on the other hand the reconstruction period (1920-1957) (hereafter: historical period). The multiple linear regression model demand, like every regression model, the predictand (Y) and the predictor (X) dataset. In this reconstruction approach the predictand consists of the upper-level temperature and GPH fields in the reconstruction period. The predictor comprises the upper-level and groundmeasurements spanning the whole period from 1920 to 2002. A third independent dataset is required for the validation in the reconstruction period; although a first quality assess- 3

ment is already performed within the calibration period using cross-validation. In the following sections the three datasets are briefly described and their quality discussed. For a deepened discussion of each dataset the reader is referred to the indicated literature. 2.1. Predictand data in the calibration/validation period As predictand we need a long, global and homogeneous 3D dataset. The two most often used data sets are the ERA40 and NCEP/NCAR reanalyses. The NCEP/NCAR reanalysis starts in 1948 and is permanently updated to the present (Kistler et al. 2001). The ERA-40 Reanalysis starts in 1957 and ends in 2002 (Uppala et al. 2005). The operational forecasting system and assimilation procedure in the NCEP/NCAR reanalysis was designed in the mid-1990s, while the core of the ERA-40 reanalysis was developed after 2000. Therefore, the two reanalyses belong to two different generations of reanalyses. In direct comparisons the ERA-40 reanalysis clearly outperform the NCEP/NCAR reanalysis (Simmons et al. 2004, Santer et al. 2004, Bengtsson et al. 2004). For this reason we choose the ERA-40 reanalysis as predictand for the reconstruction. Although most deficits apparent in the NCEP/NCAR reanalysis were removed in the ERA-40 reanalysis, some problems remained unsolved. In the Southern Hemisphere the data coverage is still poor in the early years, especially before 1967 (Uppala et al. 2005). Resulting from differences in the bias correction of satellite measurements, small jumps in the mean temperatures in the troposphere are present, with the largest inhomogeneity expected around 1975/76. In the presatellite years the extratropical Southern Hemisphere exhibits a cold tropospheric bias (Bengtsson et al. 2004). In the same years a cold bias in winter and springtime in the Antarctic lower stratosphere is apparent. The disadvantage of the shorter timer period, compared to the NCEP/NCAR reanalysis, is expected to be at least compensated by the increased data quality. Furthermore, despite the above mentioned problems with the ERA-40 reanalysis, there are good reasons to use it as predictand. Our reconstruction approach primarily focuses on spatial variability patterns for the whole troposphere and the lowermost stratosphere and is therefore less affected by inhomogeneities in either a subregion or a specific layer. Furthermore, the month-to-month variability is large relative to the observed jumps. Inhomogeneities in the predictand dataset only affect the quality of the reconstruction to the extent to which they project onto patterns of variability occurring naturally. Also, they do not normally introduce trends in the reconstruction period. The quality of the reconstruction can be assessed with a statistical bootstrap procedure in the calibration period and additionally with the independent validation data in the reconstruction period. Finally, the reported shortcomings in the ERA-40 reanalysis are largest in the Southern Hemisphere and do not influence the reconstructions in the Northern extratropics. In our case we use monthly mean fields of geopotential height (GPH) and temperature at the 850, 700, 500, 300, 200 and 100 hpa levels (thereafter termed Z850, T850, Z700, T700 etc.) interpolated to an equal area grid. Hence, the number of grid points on a latitudinal circle decreases towards the poles. The distance on a longitudinal circle is kept constant with a resolution of 2.5. 4

2.2. Predictor data The predictor data can be divided into two major groups: surface data and upper-air data. The surface data again consist of gridded sea level pressure (SLP) (HadSLP2; Allan and Ansell 2006) and homogenized surface station temperature data (GISSTEMP) (NASA- GISS; Hansen et al. 1999). The SLP dataset is incorporated as it is with a spatial resolution of 5 by 5 and spanning from 1920 to 2002. For the surface temperature predictors, stations with high data quality and good spatial and temporal coverage are preferred. Therefore, the GISSTEMP dataset was reduced according to the following criteria: First, all stations with less than 90 percent of available data in the calibration period 1957-2002 were eliminated. Second, we calculated the Pearson correlation between the temperature anomalies of each single station and the ERA-40 reanalysis 925 hpa temperature anomalies, interpolated to the station location. Stations with a correlation <0.8 were removed. Third, because the US still show an overrepresentation relative to other regions, potentially problematic with regard to the later weighting, the station network over the US is further reduced. US stations with an incomplete record in the 20 th century are discarded. Based on the above described criteria a subset of totally 613 stations is extracted covering the period from 1920 to 2002. (For the location of the surface temperature stations and the temporal evolution of the predictors see Fig. 1). After subtracting the annual cycle, based on the period 1961 to 1990, and standardization, the few remaining missing data points in the calibration period in the reduced GISSTEMP dataset are filled with standardized 925 hpa anomalies from the ERA-40 reanalysis in order to have complete data series. Brönnimann and Luterbacher (2004) showed that this is justified by the high correlation of 0.85 between the reanalysis and the station series. For the upper-air data we can distinguish between measurements taken by radiosondes, kites, aircrafts and pilot balloons. All upper-air measurements are from the period before 1958 (some reach back to 1920) and originate from many different sources. The radiosonde data is collected from the following archives: The Integrated Global Radiosonde Archive (IGRA) (Durre et al. 2006) and tape deck 6201 compilation (TD-6201) both from the National Climatic Data Center NCDC), the United States Air Force Environmental Technical Applications Center tape deck 54 dataset (TD54) and the Comprehensive Aerological Reference Data Set tape deck 542 archive (CARDS542) (Eskridge et al. 1995) both obtained from the National Center for Atmospheric Research (NCAR). Additional historical radiosonde, aircraft and kite measurements processed at ETH Zurich were added. (Brönnimann 2003a,b;Ewen et al. 2008a, Grant et al. 2008). All radiosonde datasets underwent a detailed quality control (Grant et al. 2008) and duplicates were removed. In addition, reevaluated upper-level wind data from the global TD52 and TD53 pilot balloon datasets provided by NCAR (available online at http://dss.ucar.edu/docs/papers-scanned/papers.html, documents RJ0167, RJ0168) and from the African pilot balloon dataset of MeteoFrance are used. The pilot balloon data were checked for errors with the same procedure as described by Grant et al. (2008) for radiosonde data. In the cases where no clear acceptation or rejection of a station was possible, mostly because of a too weakly correlated reference series, the variance and the mean of the historical period were plotted against the same variables from the ERA-40 reanaly- 5

sis, at the same location. Station series with a bias of more than two standard deviations, or a difference in the variance of more than 1.5 standard deviations, between the historical period and the reanalysis, were rejected if the historical time series were longer than one year. If the majority of the levels from a station showed inconsistency with the reanalysis the complete station was removed. All upper-level series cover only a part of the pre-1958 (historical) period and most do not reach the present time or have long gaps, because stations were relocated, closed or the measurement platform changed. For instance no kite data are available after the 1930s, but a substitute must be found for calibration. Therefore, we use the ERA-40 reanalysis (interpolated to the station locations and degraded with noise, see below) to supplement all historical upper-level series after 1958. The only exception are the TD52 and TD53 datasets after 1948, which were rigorously quality-checked in a previous study (Ewen et al. 2008b) using the NCEP/NCAR reanalysis. We used the data set from that study, i.e., supplemented with NCEP/NCAR data after 1948. The location of all upper-air predictors as well as the measurement platform is shown in Fig. 1. For reconstructing extratropical northern hemispheric fields, only predictor data from north of 10 N were used. In future updates, global reconstructions will be produced. In total 13974 upper air series were used (6632 kite/aircraft, or radiosonde, 7342 pilot balloon). The quality of historical data (especially upper-air data) is lower than more recent measurements. Therefore, we perturbed the predictor data after 1957 with normally distributed noise. The noise consists of a random bias (i.e. time independent) for each station and a purely random component. The standard deviation of the normal distributions of the noise is deduced from our quality assessment (Brönnimann 2003a, Grant et al. 2008). For upperair temperature data we assumed a random station bias with a standard deviation of approximately 0.5 C and a complete random component with a standard deviation of roughly 1.1 C. For all wind data we inferred 0.7 m/s for the standard deviation of the random station bias and 1.1 m/s for the purely random part. In contrast to the variables temperature and wind, where the error is kept constant with height, the error for GPH increases from the lower to the higher levels. From a standard deviation of 7.5 gpm, in the 850 hpa layer, the errors grow to a standard deviation of 20 gpm in the 100 hpa level, for the station bias, and from a standard deviation of 11.5 gpm to a standard deviation of 53 gpm for the complete random noise. After perturbation, all predictor variables are standardized and expressed as anomalies with respect to the 1961 to 1990 annual cycle. However, the data availability for any given month in the historical period is much more limited. Except the SLP data, all data series have longer or shorter gaps in the historical period. A large amount of the upper-air data, especially in the 1920s and 1930s, is confined to the lower troposphere and the coverage is much better for the continents than for the oceans. 6

Fig. 1 Map of ground and upper-air stations used as predictors. Green triangles represent surface temperature stations, blue crosses denote pilot balloon stations, black circles denote upper-air series taken by radiosondes, kites and aircrafts and red circles are upper-air stations used for the validation. Inset: Time series of available predictors from 1920 to 1957 separated by measurement platforms. 2.3. Validation data For the purpose of validation, some upper-air stations are retained and not used for the reconstruction. We selected the stations according to the following criteria: First, the stations have to cover as much of the historical period as possible with preferably no gaps. Second, to keep the validation of the reconstruction independent from the quality control procedure of the predictors, we take only stations which did not need any correction. Based on these criteria three stations are withheld: Oakland (USA), Ellendale (USA) and Lindenberg (Germany) (See Fig. 1 for their exact position). Additionally, the model is tested with two nearly independent dataset in a split-sample validation procedure. 3. Reconstruction method 3.1. Weighting scheme The available historical predictors are unequally distributed in space. In general, there is an overrepresentation of the Earth s surface, compared to the middle and upper troposphere and continents are better covered than oceans. This fact potentially leads to a focus on small scale variability near the surface over land masses. Hence, we have to weight the 7

station series to better represent the whole variability present in the predictor dataset. In a first step, all data series are assigned to an altitude bin (L0: surface, L1: 250-3000 m or 925-700 hpa, L2: 3500-6000 m or 600-500 hpa, L3: 7000-9000 m or 400-300 hpa, L4: above 9000 m or 200-50 hpa). In a second step, within each level and for the variables GPH, T and wind (u- and v-winds are treated as a single variable) the average 0.5 decorrelation distance is calculated, giving us an estimation of an influence radius. (For the influence radii for each variable and level see Tab. 1). The weight for each single station and variable is the inverse of the number of all available stations with information from the same variable in the influence radius. Subsequently, we balanced the different variables and levels against each other. The weights are adjusted such that the overall weight of a variable in a level is proportional to the total area covered by all the influence radii combined (for a map showing the covered area for a selected month and the temporal evolution of the coverage see Fig. 2). Within the surface level, 50% of the weight was attributed to SLP and 50% to the surface station temperature field. Tab. 1 Average radius [km], beyond which the spatial correlation is dropping below 0.5 - defining the influence radius of the stations. Level\Variable Temperature GPH Wind L4 1529 1483 1311 L3 1379 1425 1267 L2 1398 1448 1142 L1 1421 1487 1017 L0 1266 - - Fig. 2 Temporal evolution of the covered area for a given variable (Temperature, GPH, Wind) and level (L0- L4). The coverage is expressed in %. The dashed line represents January 1944, for which the total area coverage is given in Fig. 3. 8

Fig. 3. Area with predictor data coverage for different levels and variables for the case of January 1944. 3.2. Statistical model: setup After the regridding of the predictand to an equal area grid and after the perturbation of the predictor, the regression model is set up. As described in the data section, the predictor network in the historical period is changing over time and longer or shorter gaps in some predictors are apparent. To make use of all available data in the historical period we build a separate statistical model for each month. To reconstruct 38 years (1920-1957) we have to form 456 individual models. Between the predictors and the predictands a statistical model is fitted in the calibration period and the derived relation is applied in the reconstruction period. The approach used here is based on a principal component (PC) regression model, similar as in Brönnimann and Luterbacher (2004). It is explained here step by step. To calibrate a model for a specific month in the past, a three month moving window around the associated calendar month is used for calibration. For the reconstruction of January 1941, for example, all data from the months December, January, and February in the calibration period are selected. In a further step, only those predictor series in the calibration period are selected which are available in the defined month in the reconstruction period. The extracted subset of predictor variables in the calibration period is multiplied by 9

the weighting field pertaining to the specified historical month (for the weighting field, see section weighting scheme). Next, a PC analysis was performed on the predictand data set (standardized, all variables and levels combined) and another PC analysis was performed on the predictor subset. Each predictand PC time series is then expressed as a linear combination of an optimal subset of predictor PC time series using linear regression (leastsquares estimator). The amount of variance retained in both PC analyses is the only step that has to be optimized iteratively, and this is done for each individual model. The retained variance was varied between 70% and 98% (independently on both the predictor and the predictand side) and the best performing (according to split-sample validations; see below) subset was chosen for the reconstruction. This procedure yields PC scores (for both predictors and predictands) and regression coefficients, which then can be applied to the reconstruction period. Expansion of the predictor PC time series to the reconstructed month is performed using the corresponding PC scores obtained in the calibration period. The predictor PC values are then multiplied with a set of regression coefficients, each set giving a value of one predictand PC. These values are then used as weights for a linear combination of the predictand PC scores obtained in the reconstruction period. Finally, the standardization procedure is inversed and the fields are regridded to a 2.5 by 2.5 grid. 3.3. Statistical model: Validation The reconstructions are validated by using the split-sample validation (SSV) technique, a special case of a cross validation. Therefore, the calibration period for the final reconstruction (1957-2002) is cut into a calibration and a validation part for the SSV model. The statistical model is derived from the data in the SSV calibration period and tested in the independent SSV validation period. This procedure was repeated twice with different time periods. The model was fitted either in the period 1958-1987 or 1972-2001 and tested in the period 1988-2001 respectively 1958-1971. The potential skill of the model is measured with the reduction of error statistic (RE, Cook et al. 1994) defined as ( x ) t rec xobs ( xnull xobs ) RE = 1 (1) 2 t 2 where t is time, x rec is the reconstructed value, x obs is the observed value and x null is our null hypothesis. As we reconstruct anomalies, the null hypothesis corresponds to a zero anomaly, in our case identical with the long-time mean annual cycle (1961-1990). Values of RE can be between and 1 (perfect reconstruction). An RE of 0 is indicative of a reconstruction not better than climatology, whereas an RE > 0 points to a model with predictive skill. Due to stochastic properties, RE values can be above zero by chance. Therefore we consider reconstructions useful if RE values are above 0.2. This approximately corresponds to R 2 equal 0.2 to 0.25 (see Brönnimann and Luterbacher 2004). Because our validation period in the SSV procedure is 14 years long, equation (1) sums over 14 time steps. The result of each SSV experiment is a spatial field of RE values on the predictand grid. 10

For the model validation it is useful to aggregate the information into a single number. As the RE skill score has a fixed upper boundary at one, distributions of RE values tend to be skewed. In this case the appropriate location estimator is the RE median. For the selection of an optimal subset of predictand and predictor PCs (see section: statistical model: setup) the RE median over the entire field is calculated and maximized. For the analysis of the fields usually the averaged RE value from the two split sample validation is given. 4. Validation results 4.1. Split-sample validations Results from the SSV experiments are shown in the form of times series of the fieldmedian value of RE, averaged from both SSVs, and in the form of RE maps for specific months as an example, similar as in Brönnimann and Luterbacher (2004). The time series of the RE medians show a strong seasonal variability. Generally, reconstructions are better in winter and worse in summer. This is not surprising and has been found in previous studies (Brönnimann and Luterbacher, 2004). Large-scale atmospheric circulation patterns are easier to reconstruct, and such patterns are more dominant in winter than in summer. GPH is in general better reconstructed than temperature, and lower levels are better reconstructed than higher levels. Concerning the temporal variability, there are two main changes in the predictor network. The first one is the inclusion of radiosonde data around 1939 (although some series reach as far back as 1934). This more qualitative change is not visible in Fig. 1, where kite, aircraft and radiosonde are shown as one category. The second change is the increase of wind data in the 1940s. In the case of GPH, the inclusion of radiosonde data increases the skill somewhat at upper levels in the summer season. The second change, the increasing amount of wind information, brings the RE series from different levels closer together and is a year-round effect. Temperature shows similar temporal evolution, although the skill is generally lower than for GPH. Both changes in the network increase the skill year-round (strongest in summer). The inclusion of radiosonde data has a strong effect on 200 hpa temperature, which prior to 1939 has no skill at all. In general the skill is satisfactory, with median values in winter reaching 0.75 for GPH at all levels and 0.6 for temperature at 850 and 500 hpa. The skill is worse in summer, but for GPH useful reconstructions can still be found in summer. Note, however, that the skill is generally not good for temperature in summer. As an example of the spatial variability of RE, corresponding fields are shown for temperature and GPH at 700, 300 and 100 hpa for January 1933 and July 1938. The fields show that the skill of the reconstruction is regionally variable. Generally the skill is largest at midlatitudes, which is expected due to the better station coverage there. However, the skill is also good in some fields over the North Atlantic, while it is not very good over parts of Asia. In any case, care must be taken when using the reconstructions for a specific purpose. 11

Fig. 4. Time series of the median value of RE as a function of variable and level, averaged from both splitsample validation experiments. Fig. 5. Fields of RE for temperature and GPH at 700, 300, and 100 hpa level for two selected months (January 1933 and July 1938), averaged from both split-sample validation experiments. 4.2. Validations with historical upper-air data In addition to the SSV, we also compared the reconstructions with independent (i.e., not used in the reconstruction) historical upper-air data. The results are shown in the form of scatter plots (Fig. 6) as a function of level and variable, and in the form of time series of anomalies (Figs. 7-9). 12

The scatter plots show a good overall agreement. They also show that the variability is underestimated (due to the least-squares fitting). But no overall bias is evident. The underrepresentation of variability is stronger for temperature than for GPH. Fig. 6. Observed and reconstructed anomalies of GPH and temperature at 500, 300, and 100 hpa GPH for Ellendale (green), Lindenberg (blue) and Oalkand (red). Anomalies are with respect to 1961-1990. Fig. 7. Time series of observed (brown) and reconstructed (blue) anomalies of temperature and geopotential height at 700 hpa at Lindenberg, 1923-1938. Error bars give the assumed uncertainty of the observations and the 95% confidence intervals for the reconstructions, respectively. Anomalies are with respect to 1961-1990. 13

The results confirm the results from the SSVs in that the skill of the reconstruction is better for GPH than for temperature, where the correlation drops off already at 300 hpa. The agreement for 100 hpa GPH (although n is only 15) is excellent. Figs. 7-9 show time series of monthly anomalies of temperature and GPH at three locations from historical upper-air data and from the reconstructions. The overall agreement is good, and extremes are well represented. Data and reconstructions are mostly within each other s confidence intervals. There are some periods that show biases, such as in Oakland in 1942/1943. Since the number of predictors (from different networks) is large in these years, it is unlikely that the bias is real. Also, the validation of previous reconstructions (Brönnimann and Luterbacher, 2004) using different stations did not show this feature. We therefore suspect that this bias is a remaining data problem. Fig.8. Time series of observed (brown) and reconstructed (blue) anomalies of temperature at 850 and 700 and 500 hpa at Ellendale, 1923-1932. Error bars give the assumed uncertainty of the observations and the 95% confidence intervals for the reconstructions, respectively. Anomalies are with respect to 1961-1990. Fig. 9. Time series of observed (brown) and reconstructed (blue) anomalies of temperature and geopotential height at 500 hpa at Oakland, 1938-1945. Error bars give the assumed uncertainty of the observations and the 95% confidence intervals for the reconstructions, respectively. Anomalies are with respect to 1961-1990. 14

5. Conclusions In this report we present reconstructions of upper-level GPH and temperature up to 100 hpa for the northern extratropics. The reconstructions are based on a large amount of historical upper-air data as well as information from the Earth s surface. They cover the period 1920-1957. ERA-40 reanalysis is used to calibrate the statistical models that are used in the reconstruction process. Validations were performed within the calibration period (split-sample validations) as well as in the reconstruction period by using independent historical upper-air data. The validations show that a good skill is found for GPH and for the winter season, while care should be taken when analyzing temperature during the summer season. Specifically, it should be noted that the data are not suitable for trend analysis. Any analysis should always be accompanied by a thorough analysis of the reconstruction skill. These reconstructions will be supplemented both back in time and for other regions of the globe. Acknowledgements This work was supported by the Swiss National Science Foundation, Project Past climate variability from an upper-level perspective. We wish to thank all data providers, especially Roy Jenne and Joey Comeaux (NCAR) and Tom Ross (NOAA/NCDC) as well as MétéoFrance for providing pilot balloon data. Wolfgang Adam (DWD, German Weather Service) provided the historical data from Lindenberg that was used for the validation. 6. References Allan, R, and T. Ansell, 2006: A new globally complete monthly historical gridded mean sea level pressure dataset (HadSLP2): 1850-2004. J. Clim., 19, 5816-5842. Bengtsson, L., S. Hagemann, and K. I. Hodges, 2004: Can climate trends be calculated from reanalysis data? J. Geophys. Res., 109, D11111, doi: 1029/2004JD004536. Brohan, P., J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones: 2006: Uncertainty estimates in regional and global observed temperatures changes: A new data set from 1850. J. Geophys. Res., 111, D12106. Brönnimann, S., 2003a. Description of the 1939-1944 upper-air data set (UA39-44) Version 1.1. (University of Arizona, Tucson, USA) Brönnimann, S., 2003b. A historical upper-air data set for the 1939-1944 period. Int. J. Climatol., 23, 769-791. Brönnimann, S., and J. Luterbacher, 2004: Reconstructing Northern Hemisphere upper-level fields during World War II. Clim. Dyn., 22, 499-510. Cook, E. R., K. R. Briffa, P. D. Jones, 1994: Spatial regression methods in dendroclimatology a review and comparison of two techniques. Int. J. Climatol., 14, 379-401. Durre, I., R. S. Vose, and D. B. Wuertz, 2006: Overview of the Integrated Global Radiosonde Archive. J. Clim., 19, 53-68. Eskridge, R., A. Alduchov, I. Chernykh, Z. Panmao, A. Polansky, and S. Doty, 1995: A Comprehensive Aerological Reference Data Set (CARDS): Rough and Systematic Errors. Bull. Amer. Meteor. Soc., 76, 1759-1775. Ewen, T., A. Grant, and S. Brönnimann, 2008a: A monthly upper-air data set for North America back to 1922 from the Monthly Weather Review. Mon. Wea. Rev., 136 (5), 1792-1805. Ewen, T., S. Brönnimann, and J. Annis, 2008b: An Extended Pacific-North American Index from Upper-Air Historical Data Back to 1922. J. Clim., 21 (6), 1295-1308. 15

Gong, D. Y., H. Drange, and Y. Q. Gao, 2006: Reconstruction of Northern Hemisphere 500 hpa geopotential heights back to the late 19 th century. Theor. Appl. Climatol., 90, 83-102. Grant. A., S. Brönnimann, T. Ewen, and A. Nagurny, 2008: A New Look at Radiosonde Data Prior to 1958, J. Clim. (submitted). Hansen, J., R. Ruedy, J. Glascoe, and M. Sato, 1999: GISS analysis of surface temperature change. J. Geophys. Res., 104, 30997-31022. Kistler, R., and Coauthors, 2001: The NCEP-NCAR 50-year reanalysis: monthly means CD-ROM and documentation. Bull. Am. Meteorol. Soc., 82, 247-267. Klein, W. H., and Y. Dai, 1998: Reconstruction of Monthly Mean 700-mb Heights from Surface Data by Reverse Specification. J. Clim., 11, 2136-2146. Lanzante, J. R., S. A. Klein, and D. J. Seidel, 2003: Temporal Homogenization of Monthly Radiosonde Temperature Data. Part I: Methodology. J. Clim., 16, 224-240. Mitchell, T.D., and P. D. Jones, 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. Int. J. Climatol., 25, 693-712. Parker, D. E., M. Gordon, D. P. N. Cullum, D. M. H. Sexton, C. K. Folland, and N. Rayner, 1997: A new global gridded radiosonde temperature data base and recent temperature trends. Geophys. Res. Lett., 24, 1499-1502. Rayner, N. A., D. E. Parker, E. B. Horton, C. K. Folland, L. V. Alexander, D. P. Rowell, E. C. Kent, and A. Kaplan, 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. J. Geophy. Res., 108, 4407. Santer, B. D., and coauthors, 2004: Identification of anthropogenic climate change using a second-generation reanalysis. J. Geophys. Res., 109, D21104, doi: 10.1029/2004JD005075. Schmutz, C., D. Gyalistras, J. Luterbacher, and H. Wanner, 2001: Reconstruction of monthly 700, 500 and 300 hpa geopotential height fields in the European and Eastern North Atlantic region for the period 1901-1947. Clim. Res., 18, 181-193. Simmons, A. J., P. D. Jones, V. da Costa Bechtold, A. C. M. Beljaars, P. W. Kallberg, S. Saarinen, S. M. Uppala, P. Viterbo, and N. Wedi, 2004: Comparison of trends and low-frequency variability in CRU, ERA-40 and NCEP/NCAR analyses of surface air temperature. J. Geophys. Res., 109, D24115, doi: 10.1029/2004JD005306. Smith, T. M., and R. W. Reynolds, 2004: Improved extended reconstruction of SST (1854-1997). J. Clim., 17, 2466-2477. Thorne, P. W., D. E.Parker, S. F. B.Tett, P. D. Jones, M. McCarthy, H. Coleman, and P. Brohan, 2004: Revisiting radiosonde upper-air temperatures from 1958 to 2002. J. Geophys. Res., 110, D18105, doi:10.1029/2004jd005753. Uppala, S. M., and Coauthors, 2005: The ERA-40 re-analysis. Q. J. R. Meteorol. Soc., 131, 2961-3012. 16