Homogenization of longterm monthly Spanish temperature data


 Miles Marshall
 1 years ago
 Views:
Transcription
1 INTERNATIONAL JOURNAL OF CLIMATOLOGY Published online in Wiley InterScience (www.interscience.wiley.com).1493 Homogenization of longterm monthly Spanish temperature data M. Staudt,* M. J. EstebanParra and Y. CastroDíez Departamento de Física Aplicada, Universidad de Granada, Granada, Spain Abstract: Reliable timeseries is the basic ingredient when analysing climatic changes. However, the errors in real data are frequently of the same order as the signal being sought. Therefore, the available longterm monthly series of Spanish minimum and maximum temperatures have been compiled from the late 19th century on, in order to compile a highquality data set. The series are organized into climatically homogeneous regional groups and, in each group, the detection and adjustment is based on relative homogeneity and an analysis of the stationarity of the whole set of temperaturedifference series. These series are scanned with moving t, Alexandersson, and Mann Kendall tests. The detected inhomogeneities are adjusted by weighted averages of the regional series. The method is iterative and advances in steps of detection, adjustment, and actualization. Individual inhomogeneous data are discarded and gaps are filled by similar weighted multiple means. For the analysis of the temperature evolution in the Iberian Peninsula, each region is finally represented by one local series and the regional average. The urban effect on minimum temperatures is adjusted by an empirical method, and for Madrid also by a correction derived from new homogenized data. Generally, rigorous homogeneity cannot be achieved because the initial data quality is deficient in many cases and metadata are sparse. Nevertheless, the data homogeneity and quality has been considerably enhanced: the total error margin in a series is of the order of 0.3 C 0.4 C, under consideration of a worstcase error accumulation. On the other hand, the number of inhomogeneities is considerable and their average amplitude is of the order of 1 C reflecting the much larger error margin in the raw data. The homogenized dataset compiled constitutes an important basis for the subsequent detection of thermal changes in Spain in the last 130 years, on a clearly higher confidence level than before. KEY WORDS temperatures; data homogeneity; statistical tests; climate change; Spain; Iberian Peninsula Received 14 July 2005; Accepted 9 December 2006 INTRODUCTION Reliable data are a necessary basis for a study of the evolution of a climatic variable and the detection of changes. In many countries, systematic instrumental weather observations began in the 19th century and since then, the availability of quantitative data has considerably improved. A time series of a climatic variable is called homogeneous when its variations have a climatic origin only (Mitchell et al., 1966). Unfortunately, a vast majority of all climate records is adversely affected by nonclimatic changes in the data. A relocation of an observatory, replacement of instruments, variations in the environment or in reading procedures, as well as human errors in data processing are rather frequent. Under these circumstances, a series suffers artificial biases, most frequently sudden jumps or breaks, and may fail to represent the real climatic evolution. A reliable detection of climate change * Correspondence to: M. Staudt, Departamento de Física Aplicada, Facultad de Ciencias, Campus de Fuentenueva, Universidad de Granada, Granada, Spain. is hard or impossible when the error related to data quality is of the same order of magnitude as the signal being sought. The large extent of the data quality problems is well known in the recent literature of climate research. In Chapter 12, the third assessment report of the IPCC (IPCC, 2001) states that The quality of observed data is a vital factor. Homogeneous data series are required with careful adjustments to account for changes in observing system technologies and observing practices. Moreover, Petersonet al. (1998b) point out that Unfortunately, most longterm climatological time series have been affected by a number of nonclimatic factors that make these data unrepresentative of the actual climate variation occurring over time. Trenberth (2002) notes that we do not have an adequate climate observing system and There must be an active program of research and analysis utilizing climate data sets to ensure the data are stateoftheart and meet requirements. Besides observational programmes for improving future data quality, undoubtedly a strong effort must also be dedicated to homogenization and quality control of the existing data.
2 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ An early introductory study on homogeneity and statistical tests was given in Mitchell et al. (1966). They described the problem of achieving absolute homogeneity. Among the more recent efforts in data homogeneization, Goossens and Berger (1986) applied different statistical methods, such as the Mann Kendall test, to the detection of changes in climatic series. Alexandersson (1986) developed the Standard normal homogeneity test (SNHT), applied to the Swedish precipitation series in subsequent work (Alexandersson and Moberg, 1997; Moberg and Alexandersson, 1997). The SNHT is one of the most efficient tests for homogeneity, as Ducré Robitaille et al. (2003) recently demonstrated. Several homogenization methods have been created and first applied to North American data. Karl and Williams (1987) developed a method that explicitly considers the metadata, detects and adjusts data changes statistically, using adjacent series. This method applies to a large number of North American temperature and precipitation series. Young (1993) and Rhoades and Salinger (1993) presented alternative methods, also based on similar data from highly correlated series. Peterson and Easterling (1994) and Easterling and Peterson (1995), developed a different strategy, with reference series and a Monte Carlo method, and they adjusted the data by leastsquare linear regressions. The method of Vincent (1998) works with multiple regressions and is applied to daily Canadian temperature series in Vincent and Gullett (1999) and Vincent et al. (2002). In recent studies, attempts have been made to homogenize European data. Slonosky et al. (1999) have created a method with multiple comparisons and adjustments between adjacent series, but without reference series, and have applied it to longterm European pressure series. Their results prove to be similar to those of analytically more sophisticated methods, such as the statistical technique by Mestre (1999). GonzálezRouco et al. (2001) have homogenized the southwestern European precipitation series with an iterative method by extending the strategy of HanssenBauer and Forland (1994). Stepanek (2003) has recently created the software AnClim, especially for the practical application of virtually all relevant homogenization methods for climate data. In a recent international effort on data quality, Wijingaard et al. (2003) analyzed the daily temperature and precipitation data of the European Climate Assessment (ECA). They have found that a vast majority of the series suffer clear homogeneity problems. Nevertheless, among the applications of the homogenization methods in literature, there is still a lack of systematic treatment of Spanish temperature data. The present study carefully prepares these data series, seeking to achieve maximum data quality. The aim is to set a solid base for a reliable subsequent analysis of thermal changes and its confidence levels on a regional scale since the late 19th century. DATA The Spanish temperature data used in this study have been provided by the National Meteorology Institute (INM). The recording of monthly temperatures began sometime between 1869 and 1880 in about 20 observatories, mainly in province capitals (older records are rare), although at some sites the observations were not recorded until the first or second decade of the 20th century. Data quality is problematic or even poor in many cases, because of frequent site changes and data gaps, and metadata are scarce. Figure 1 gives a schematic overview of the temporal data coverage until 1980 and shows the geographic distribution of the observatories. Definition of the regional groups of data series The Spanish monthly temperature series contain a high degree of common variability. The crosscorrelations between the anomalies usually exceed 0.5, even at distances of the order of 500 km. Nonetheless, the temperatureanomaly patterns show regional distinctions, as found for the winter maxima by Frías Domínguez et al. (2002). The prior compilation of the data series into climatic groups derives from these regional differences. The basic threefold distinction separates the peninsular mode of thermal evolution that on the one hand includes, geographically, the central plains and the major part of the south, and on the other, the Mediterranean (eastern) and Cantabrian (northern) coastal areas. Furthermore, Galicia, western Andalusia, Extremadura and the Ebro valley are also treated as climatically different groups, in order not to eliminate possible regionally distinctive details of the temperature evolution. A preliminary analysis of the series from the high plains and the Mediterranean did not detect significant differences between the temperature evolutions in their northern and southern regions. The crosscorrelations between the anomalies in each regional group systematically exceed 0.6 and clearly confirm the high level of regional synchronicity of the variations, an essential ingredient for the homogenization method. According to these results, the regional groups that will be homogenized separately, without mixing information between them by adjustments, are (the number of series in each group is given in parenthesis): Galicia (6), Cantabria (5), Ebro Valley (4), Mediterranean (6), central high Plains (14), western Andalusia (4) and Extremadura (2). In each of these climatic regions, all the series are homogenized and then the regional mean series (simple mean of the anomalies) is computed aposteriori. From the homogeneity viewpoint, each individual series could represent its region, but the regional mean series is particularly valuable for subsequent analysis. Hence, each region is going to be represented by one local series and the regional mean, in order to analyze the recurrence of the results, in the sense of coherence among the two representative series: a variability feature is of high authenticity if it appears in both series.
3 LONGTERM MONTHLY SPANISH TEMPERATURE DATA Figure 1. Scheme of the temporal and spatial coverage of the Spanish maximum and minimum temperature series, between 1860 and 1980 (in more recent years, the coverage is complete, with very few exceptions). The series are: 1. La Coruña, 2. Santiago, 3. Pontevedra, 4. Orense, 5. Vigo, 6. Finisterre, 7. San Sebastián, 8. Bilbao, 9. Santander, 10. Vitoria, 11. Pamplona, 12. Oviedo, 13. Zaragoza, 14. Huesca, 15. Logroño, 16. Teruel, 17. Lérida, 18. Gerona, 19. Barcelona, 20. Castellón, 21. Valencia, 22. Alicante, 23. Murcia, 24. Almería, 25. Burgos, 26. Valladolid, 27. Salamanca, 28. Soria, 29. León, 30. Palencia, 31. Zamora, 32. Ávila, 33. Segovia, 34. Madrid, 35. Guadalajara, 36. Toledo, 37. Cuenca, 38. Albacete, 39. Ciudad Real, 40. Córdoba, 41. Seville, 42. Huelva, 43. Jerez, 44. Málaga, 45. Granada, 46. Jaén, 47. Badajoz, 48. Cáceres. The regional groups are: A) Galicia, B) Cantabria, C) Ebro valley, D) Mediterranean, E) central plains, F) western Andalusia and G) Extremadura. Discarded data due to homogeneity problems The rejection of data or intervals has not been avoided, when the homogeneity problems were too strong to permit an adjustment at an acceptable confidence level. This happens under the following circumstances: Individual data or intervals are discarded, if their difference with at least two (or three) of the other series of the region is extreme at the 95% confidencelevel (in an appropriate time interval around these data). Disconnected short intervals (shorter than a decade) with many interruptions are also discarded as well as intervals where more than approximately onethird of the data are missing. Apart from all the available difference series, the anomalies of the candidate series are always thoroughly cross checked. An interval has to be discarded when the available data in a given region do not permit an adjustment of an inhomogeneous break at a satisfactory confidence level (when no other or only one more series is available).
4 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ A whole series is generally discarded when more than five discontinuous breaks (or other clear inhomogeneities to be adjusted) are found. This decision depends also on the length and overall quality of the series (one long series is maintained with six adjusted breaks Table I). Unfortunately, the following long intervals or entire series had to be rejected: In eastern Andalusia, the available longterm series from Jaén and Granada were discarded because their temporal data coverage was unsatisfactory. Table I. Total numbers of data, adjustments (adj.) and rejected (rej.) data (individual data or intervals) in the maximum and minimum temperature series. Series Nr. Nr. Nr. rej. Nr. Nr. Nr. rej. data adj. data data adj. data Maximum temperatures Minimum temperatures La Coruña Santiago Pontevedra Orense Vigo Finisterre San Sebastián Bilbao Santander Vitoria Pamplona y Zaragoza Huesca Logroño Lérida Valencia y Gerona Barcelona y Castellón Alicante y Murcia Madrid Ávila Burgos León y Palencia Salamanca Segovia Soria y Zamora Guadalajara y Toledo Albacete Ciud. Real Cuenca Seville Córdoba Huelva Jerez y Málaga Badajoz Cáceres The data in Galicia before 1880 and the minima in the Mediterranean before 1893 were also rejected because of severe homogeneity problems and/or lack of data. The 19th century data in western Andalusia and Extremadura could not be connected with sufficient confidence to the 20th century data and therefore were not considered. The lost information was partially recovered by defining an average series that consisted of western Andalusia, Extremadura and Málaga, where these data were connected and used. Some individual series or long intervals were rejected because of severe homogeneity problems: the maxima and minima series in Valladolid, the maximum records in Alicante and the minima in Ciudad Real, as well as the minima in Orense until 1949, the maxima in León until 1937, the minima in Guadalajara until 1970 and in Cuenca until METHODOLOGY Statistical properties of the monthly temperature data Temperature records show little variation on spatial scales of hundreds of kilometres, in regions with a regular orography such as the central plains of the Iberian Peninsula and along the coasts, where the crosscorrelations between the monthly series generally exceed a factor of 0.7. Nonetheless, regional differences may be crucial in studies of the temperature evolution and its significance levels. The dataset of the present study is developed not only for a highconfidence analysis of the general trends, but also of the interregional differences in Spanish temperatures. Monthly temperature series show a distinct lack of stationarity, because of frequent trends at time scales of months or several years, which are highly significant in many cases. Schönwiese and Rapp (1997) point out that... shortterm trends... become enormously unstable in all seasons, even changing their sign. This variability characteristic complicates the detection of inhomogeneities and requires high significance levels, to avoid an erroneous detection and attribution of inhomogenities. These stationarity properties do not differ significantly among the treated regions and therefore, the same statistical criteria are applied everywhere. The statistical distribution of the temperature data is normal as a good approximation and there is no problem in applying parametric statistics designed for Gaussiandistributed variables, as the ttest or the SNHT. The autocorrelations (serial correlations) in these series are rather slight (coefficients between 0.1 and 0.3) but several statistical tests require corrections (the reduced sample size for the ttest and prewhitening of the series for the Mann Kendall test), in order to achieve realistic confidence levels. The basic homogenization concept The criterion of absolute homogeneity is fulfilled if a climatic series does not include any variability, except
5 LONGTERM MONTHLY SPANISH TEMPERATURE DATA for the real climatic evolution. However, this condition is almost never fulfilled, because of the problems in real data. Easterling et al. (1996) pointed out that... the real homogeneity of climatic data is irretrievably lost. From the analysis of an individual series, it is generally impossible to decide at a high confidence level whether or not a certain change is inhomogeneous, and the absolutehomogeneity criterion is therefore not applied in the present study. The concept of relative homogeneity developed here is based not on individual series, but on their differences, because the anomalies of highly correlated time series are essentially synchronous. Hence, a local inhomogeneity can be detected in the difference series, where on the other hand an authentic extreme anomaly tends to vanish. This detection method fails if several series suffer a simultaneous data problem (e.g. a common sudden jump). Comparing as many difference series as possible minimizes this risk. The following relative homogenization method is on the basis of multiple comparisons between the climatically similar series within each predefined climatic region. No reference series is defined because the frequent inhomogeneities and missing data do not permit a reliable apriori reference. The whole set of difference series (differences of anomalies) is statistically tested for significant changes (see The scheme of the homogenization method ). Once identified, an inhomogeneous change is adjusted by a weighted mean of the highestcorrelated series. The weighting factors depend on the synchronicity (crosscorrelation) and the number of common data of each surrounding series of the same region, relative to the candidate. For an abrupt change, the after : before difference is replaced by this weighted average (see The adjustment algorithm ). The series are adjusted separately in each region, to avoid merging information. This is essential to prepare the dataset for a subsequent detection of regional differences. The scheme of the homogenization method 1. The rawdata series are converted into anomalies, relative to the monthly mean of a given reference period (the final reference is ). The whole set of anomaly difference series is computed within each region (these are more efficient than absolute differences, because in the latter, stronger residuals of the annual cycle remain). Following the idea of multiple comparisons, in a region with n series, n 1 1 i difference series are simultaneously analyzed, in order to detect (and then to adjust) the significant inhomogenities in all series. 2. The suspicious inhomogeneities are marked (mostly abrupt changes or breaks, but also individual extreme data), with particular attention to the metadata information. 3. The largest and most obvious extreme values (outliers) are identified and discarded when the anomalies exceed a certain level (four standard deviations of a running 30year interval, centerd at each data point, although sometimes, data coverage restricts the detection interval length). This search is based on the difference series, to avoid the rejection of authentic large anomalies. In this step, the criterion is severe and still preserves inhomogeneous data. It removes only the very large inhomogeneous outliers, prior to the closer analysis. 4. The set of difference series is recalculated and the possible abrupt inhomogeneities (breaks) are searched for and classified. Then, for each feature suspected to be inhomogeneous, an appropriate base interval is individually defined for statistical detection and verification. The length of these base intervals is generally years, symmetrically around the possible breakpoint (if possible) and must strictly avoid temporal overlapping with other inhomogeneities, that would produce skewed results. Besides a reasonable sample size (at least of the order of 100), the socalled station drift must be considered: the differences between highly correlated temperature series are often not stationary, but show frequent trends of changing signs, (even in the absence of site changes or other inhomogeneities, Rhoades and Neill, 1995). Therefore, the base intervals of the candidate series must be shorter if a stronger drift (less stationarity) is present, because earlier or later data are then less valid for the adjustment at a certain time. 5. The statistical tests are applied on the whole set of difference series in the base intervals that have been defined in the previous point. Moving t and SNHT (Alexandersson) tests scan the intervals, to determine the probability of a break, as a function of its time (see The statistical detection of discontinuous inhomogeneities ). Special attention is given to the metadata, by examining first the time intervals around the incidents reported in the literature. But the metadata are scarce and the method considers them, but does not need them. The general detection criterion for an inhomogeneous break is at a level significantly higher than 99% in the ttest, and at least 50% above the 95% level in the SNHT, recurrent in three difference series with highly correlated data. The local anomalies are checked in order to avoid wrong conclusions and, in doubtful cases, the results are subjected to the sequential Mann Kendall test. 6. Once an inhomogeneous break is detected, the adjustment works with a weighted average of the highestcorrelated simultaneous regional data (up to five series). The candidate s after : before difference is replaced by a weighted average of the analogous differences of the correction series. In very few particular and highly significant cases, continuous inhomogeneous features are detected and adjusted by a similar procedure. The method is similar because the detection is performed with the same statistical tests and the adjustment consists of a linear trend that is obtained
6 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ as the weighted mean of the slopes of the highly correlated nearby series. To assure the essential noninterference between the different adjustments, steps 4 6 are executed in an iterative way, although common for all series: after adjusting all the disjointed inhomogeneities in all the series of a region in the first iteration, the set of difference series is recalculated before applying the tests again in the second iteration. This iterative method is necessary, because the correction intervals frequently overlap each other and in this sense, not all breaks and its corrections are independent or disjointed from each other. Furthermore, in some cases, slighter inhomogenities could not be detected until a large inhomogeneity was adjusted and the detection was repeated in the next iteration step (with all data actualized). The iteration stops when no more significant inhomogeneities are detected after actualising the series. 7. A search is made for the individual inhomogeneous data by detecting extreme values (as explained in part F) of the difference series and controlling these data points at each local series. The detected inhomogeneous data are removed. 8. The missing data are filled by weighted means of the bestcorrelated synchronous data (originally missing data or gaps created by removed inhomogeneous data). The filling algorithm works with up to five regional series and assumes synchronicity between these series (see The replacement of missing data ). 9. Finally, the dataset is prepared with two time series for each climatic region, expressed as anomalies, relative to the reference period : one local series and the regional average (all the local series are also available, for further purposes). The statistical detection of discontinuous inhomogeneities A break is detected when the corresponding significance exceeds the 99% level in the ttest and exceeds the 95% level by 50% in Alexandersson s test, in a recurrent way in at least three cases (three differences series of the same candidate). The windowed ttest. This well known statistical test measures the significance level of a change in the mean, is parametric and assumes normality and serial independence. It is robust against slight deviations from normality, if the sample is large enough (n >20), but significant autocorrelations cause skewed results. The test overestimates the significance when these are positive (common in temperature records). With a firstorder autocorrelation coefficient ρ 1, the reduced sample size correction replaces the sample size n by n = n(1 ρ 1 )/(1 + ρ 1 ). This correction is valid, because the memory of monthly temperature records is rather short and its autocorrelations are essentially of a first order. Preliminary experiments show that this correction reducesthe statistical confidence typically by 20 30%. The SNHT (Alexandersson s standard normal homogeneity test). This test by Alexandersson (1986) (initially applied to the Swedish precipitation series) is now frequently used in climatology. It detects a single abrupt change (break) in the mean value of a Gaussian time series, assuming two stationary subseries, before and after the (possible) break, against the null hypothesis of one stationary series. The 95% confidence level for a break is 9.15 for a sample size of 100, rises slightly to 10 for 400 data and to 10.5 for 800 data. As mentioned, the present study requires higher significance levels: the detection will be considered highly significant if the coefficient exceeds the 95% level by 50% (value =15). An example of the detection of an inhomogeneous break is given in Figure 2. The running ttest and SNHT show a similar behavior and confirm a highly significant break at the beginning of the 1980s (the 95% levels are =2 forthettest and =10 for the SNHT). The SNHT has a sharper peak, due to its quadratic algorithm. The ttest in the 20year running window gives lower significance levels than in the 40year interval, because of the smaller sample size. The adjustment algorithm After an inhomogeneity (break) in the candidate series and its time is detected, the adjustment works as follows: Case 1. When there is a sufficiently long overlapping interval (at least 3 years) of the subseries xt 1 and xt 2 around the break point, after verifying the synchronicity of the evolution in both subseries and the absence of clearly inhomogeneous features, the adjustment is made as the mean difference = k 1 k ( t=1 x 1 t xt 2 ) in the overlapping period (t = 1,...,k). Case 2. When a series undergoes a break for nonclimatic reasons, usually there are no overlapping data and the adjustments are based on multiple differences between the candidate series and the highly correlated series of the same region. The crosscorrelations ρ j between the candidate series x t and the j available series x j t are computed for these intervals, with corrections for the autocorrelations, if necessary (use of the whitened residuals, after separating an ARIMAprocess from the series). Up to k = 5series are chosen for correction under the criterion of highest crosscorrelations. Given an adjustment interval of m months, with data x t and x j t at each side of the break at t = τ: {x t,x j t ; t = τ m + 1,...τ; j = 1,...k} and {x t,x j t ; t = τ + 1,...τ + m; j = 1,...k}, the mean after : before difference is computed for each j = 1,...k. With the indices bef = before and aft = after the break, the
7 LONGTERM MONTHLY SPANISH TEMPERATURE DATA Figure 2. (A) a difference series of maximum temperatures; (B) the coefficients of the ttest with a 20year running window (discontinuous line) and in the whole 40year interval (both left axis) and of the SNHT in the 40year interval (thick line, right axis). partial adjustment j, given by one neighbouring series j, is j = (x j af t x af t ) (x j bef x bef ) = 1/m ( τ+m ) (x j t x t τ )) (x j t x t t=τ+1 t=τ m+1 (1) The total adjustment term is a linear superposition of the k individual offsets j, with the squared crosscorrelations ρj 2 and the coefficients q j, (common data fraction) as weight factors: ( k ) ( k ) = j=1 ρ2 j q j j / j=1 ρ2 j q j (2) This adjustment is applied to the data before the break, because leaving the recent data unchanged is a practical advantage for later updates. The adjustments of the breaks are always based on the whole monthly dataset, but generally, the adjusted value does not depend on the month or the season. Only a few seasonally distinct adjustments are applied, when the seasonal discrepancies are particularly large. In the literature, different types of adjustments can be found (see for example Peterson et al., 1998a). Inhomogeneities in climate data often depend on the month or season, because of the seasonally diverging impacts of instrumental or environmental changes. Hence, an adjustment that depends on the month of the year can theoretically be better. However, it modifies the variability, the autocorrelation structure, and the annual cycles of the data, whereas the adjustments generally performed here consist of a simple additive term. Furthermore, a monthly varying adjustment must work with 12 times fewer data (for a given interval length) and the confidence margins are substantially wider. Hence, adjustments of this type become more attractive when the initial data quality is higher than in the present study. The detection of individual extreme anomalies The extreme anomalies are detected relative to a symmetrically running 30year interval centerd at each data point. The detection does not work relative to a fixed reference interval, but with a moving window, to determine extreme events relative to the mean temperatures and variability of their adjacent period. All local extreme anomalies are catalogued, but, as in the preceding steps, their differences are crucial for the homogeneity analysis. An anomaly is generally (with few exceptions) considered inhomogeneous when its amplitude exceeds the 2.81 σ  level (99.5% confidence) in at least three difference series between the candidate and the surrounding series. Once the inhomogeneous data is deleted, the gaps can be filled in, as described in the following section. The replacement of missing data Data gaps are frequent in almost all climate data and are an obstacle for an analysis that requires complete series. However, to reject all incomplete series would mean the discarding of almost all data, and therefore a filling strategy for the gaps is necessary. Missing at random (MAR), Little and Rubin, 1987) is a basic condition for missing data (usually presupposed). It is fulfilled when the occurrence probability of a gap at a certain time is independent of the variable s value at this time. This is not the case when certain data are lacking because the values were extreme and could therefore not be measured. The first type of available information for filling gaps is the intrinsic information in the series. The ARIMAmethod (autoregressive integrated moving average, see for example Box and Jenkins, 1976) analyses stationarity and autocorrelations of a series and decomposes it into an ARIMAseries and whitenoise residuals. A prediction can be drawn out of the ARIMAsubseries (the best predictor of a white noise is always zero). For monthly temperatures, this method has little predictive potential
8 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ because the white noise component generally explains 70 90% of the variability. Hence, the ARIMA method is used to fill a data gap with the intrinsic information of the candidate series, only when no simultaneous regional data are available. The second information type, the synchronous data of the adjacent series, is more efficient in replacing missing data on account of the high crosscorrelations between nearby series. The basic idea is to weight the contribution of each time series according to its confidence level. This is done by a weighted average of the anomalies of up to m = 5 related series. Several points have to be considered for a proper gapfilling algorithm: The most relevant information is again contained in the adjacent years, because the series memory is rather short. Hence, all involved anomalies are computed relative to a symmetric interval (usually 30 years) around the gap. The interpolations are based on standardized anomalies, because standard deviations may differ systematically, even between highly correlated temperature series. The higher the correlation with the candidate, the higher is the weight that will be given to a series contribution. The confidence in a contribution (by a series) that is based on a reduced number of common data with the candidate decreases and so must its weighting factor. If the crosscorrelations of the surrounding series with the candidate are weak, the amplitude of the correction is reduced. This means a more cautious gap filling (closer to zero), because in case of complete ignorance, the gap would be filled with an anomaly of zero. Let T c and T j (j = 1,...m) be the temperature anomalies of the candidate and the correction series at a certain time, σ c and σ j their standard deviations and c j the weighting factors. Then, the anomaly to fill in the gap of the candidate series is T s = m j=1 c j σ s σ j T j, while m c j = 1 (3) j=1 The weights c j depends on the squared crosscorrelations ρ 2 j with the candidate and on a common dataparameter q j,sothat m c j = q j ρj 2 / q j ρj 2 (4) j=1 Weak crosscorrelations are considered by computing their sum of squares S. If this factor is smaller than unity, the temperature estimation in Equation (3) is multiplied by S. If S is even smaller than 0.5, this method is discarded and the data gap is filled by an ARIMA interpolation (intrinsic information in the candidate series). After this step, the regional average is computed and the series are now almost ready for a comparative analysis in each region, with minimized homogeneity problems and without gaps. The last factor (given below) to be considered is the urban heat island. THE ADJUSTMENT OF THE URBAN HEAT ISLAND The urban heat Island The smallscale urban warming is a wellknown phenomenon in climatology. Its principal causes are the heatstorage capacity of buildings and streets, the quick removal of rainwater, the heat emissions from houses, vehicles and industry, and sometimes a reduced infrared radiative heat loss, due to locally increased atmospheric turbidity. Several studies at different latitudes have found significant thermal differences between urban and rural observatories (Tereshchenko and Filonov, 2001; Figuerola and Mazzeo, 1998; Shahgedanova et al., 1997; Landsberg, 1981; Oke, 1973) and the state of knowledge about the urban heat island is described in Arnfield (2003). An adjustment of the urban effect is necessary to attain realistic results concerning thermal evolution and changes for large cities. The urban effect usually is greatest during the minimum temperatures in the early morning and under anticyclonic conditions (Montávez et al., 2000; Unger, 1996; Colacino and Lavagnini, 1982). Consequently, it also depends on the season: Yagüe et al. (1991) found a strong urban effect in Madrid in summer and a weaker effect in spring. In the present study, the adjustment will be on a longterm basis, without a need to discriminate between seasons or weather types. Hence, the aim is to establish a quantitative relationship between the urban population (as a measure of city size) and urbanrural temperature differences. The empirical urban adjustment Unfortunately, the Spanish data coverage is not sufficient for a study of this relationship on a solid statistical basis, and thus an empirical result is used. After reviewing the literature the aforementioned studies and, moreover, Kukla et al. (1986), Colacino and Rovelli (1983), Moreno García (1994), Portman (1993); Kozuchowski et al. (1994) and Karl et al. (1988) we adopted from the latter study the relation T urb rur = a popul 0.45 urb (5) where popul urb represents the population of the city. This result is based on a large number of data series (more than 1200) and recently confirmed by Englehart and Douglas (2003). The most consistent results were found with a coefficient a = (2.39 ± 0.70) 10 3 K(±95%) for minimum temperatures (Karl et al., 1988). For maximum temperatures, the results did not differ significantly from zero. Furthermore, an urban effect on the maxima was
9 LONGTERM MONTHLY SPANISH TEMPERATURE DATA neither theoretically well explained nor clearly confirmed in the above works, although Philandras et al. (1999) reported an urban effect in Athens that was stronger in the maximum than in the minimum temperatures. Hence, in the present study, this empirical urban correction is applied to the local representative series of each region, as a function of its population, but only for the minimum temperatures The urban thermal effect is generally weaker in Europe than in northern America and the corresponding adjustment factor of 0.7 (Karl et al., 1988) is applied and discussed for Spain. An alternative adjustment for the minimum temperatures in Madrid For Madrid, an alternative correction for the minimum temperatures is constructed with the data of Madrid Retiro and Toledo. The latter observatory is far enough from Madrid to be outside the urban area, but near enough to have almost identical climatical conditions. Segovia and Ávila are discarded, because a mountain range divides these observatories from Madrid, while Guadalajara is rejected because of its limited data coverage (Table I). The differences of the minimum temperatures of Madrid Toledo show the urban influence, with a clear and highly significant increase between 1930 and 1970 (Figure 3), roughly synchronous with the urban growth of Madrid (the growth of Toledo is considered negligible). The linear regression T urb rur = p + q pop Madrid (6) links the differences in minimum temperature and the population (the parameters p and q are given in Table II). For the minimum temperatures in Madrid, both urban corrections can be compared (The empirical urban correction Table II. Row I: parameters of the linear regression (x) for the differences of Madrid Toledo in minimum temperatures, as a function of the population of Madrid (in millions). Row II: like I, but for the 8year moving averages; ε is the total error of the estimation in the period The coefficient in row II has a smaller error than the one in row I, due to the compensation of individual anomalies, and has been chosen to compute the adjustment. P( C) q( C/10 6 habit.) ε ( C) (I) ± ± (II) ± ± and the approach with the data of Madrid and Toledo). The resulting coefficient q in Equation (6) defines an urban correction of approximately 0.35 C for each million habitants, whereas the empirical adjustment Equation (5) is larger, applied to Madrid: even under application of the reduction factor of 0.7, the correction is about 1.27 C for the first million habitants, 0.46 C for the second and 0.35 C for the third million. RESULTS The compiled homogenized dataset; adjustments, and rejected data In this study, 43 monthly series of maximum and minimum temperatures, almost all available Spanish longterm series with coverage longer than 30 years, have been organized into seven regional groups and homogenized. The analysis of data quality confirmed widespread homogeneity problems. Adjustments were necessary in almost all series, although the criteria for the detection of inhomogeneities were severe (high significance and Figure 3. Population of Madrid (dotted line, right axis) and 8year moving averages of the differences of Madrid Toledo in minimum temperatures (continuous line, left axis).
10 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ redundancy levels). In some cases, long intervals (the maxima in León and minima in Guadalajara and Cuenca) or entire series (the maximum temperatures in Valladolid and the minimum temperatures in Ciudad Real) were rejected, because of a lack of homogeneity. On the whole, 59 (85) inhomogenities were adjusted to maximum (minimum) temperatures (Table II), with the mean amplitudes of 1.00 C (1.05 C); in addition, there were many rejected intervals and individual data. On average, one adjustment was made for every 44.5 years (66 years) and a series of years required an approximate mean of two adjustments. The temperature evolution of each region was then represented by its average anomalies and one local series. This dataset maximized the confidence, because all participating series were carefully analyzed and adjusted for homogeneity. Interregional differences ( km scale) in the temperature evolution were resolved, although they were of second order, compared to the common variability at the 1000 km scale. The subregional differences (<100 km) were of third order and impossible to resolve with these series, owing to the limited data quality and because the adjustments had intentionally mixed the data within one region. The adjustments applied to the data series consist mainly of corrections of breaks (abrupt changes). Moreover, the series were also scanned for individual inhomogeneous and extreme data, and the gaps in the two representative series of each region are filled. Table II gives an overview of all the adjustments and as an example, Table III lists the details of the adjustment for the maximum and the minimum temperatures in Madrid. As further examples of the results, in Figures 4 and 5, the monthly temperature anomalies before and after the homogeneization process are shown for four different series. Some effects of the homogeneization are clearly visible: in La Coruña (Figure 4(A), (B)), the net warming of the minima was too large, as a consequence of an inhomogeneous break of considerable amplitude in ; in Seville (Figure 4(C), (D)), the 19th century data were rejected because of the lack of simultaneous regional data, an important break in was adjusted and the data of three different series were unified (with adjustments); in the maxima in Madrid (Figure 5(A), (B)), the large break in impedes a reasonable analysis without homogenizing and in the minima in Madrid (Figure 5(C), (D)), the two breaks in and again are important, too. The increase in data homogeneity and quality for climatechange studies (the main goal of this study) is further investigated below in the sections An Estimation of the error Margins of Raw and homogenized Data, and, A Figure 4. Monthly anomalies of the minimum temperatures in La Coruña and the maximum temperatures in Seville: raw data (left side) and homogenized data (right side; all series with distance weighted least squares fits).
11 LONGTERM MONTHLY SPANISH TEMPERATURE DATA Figure 5. Monthly anomalies of maximum and minimum temperatures in Madrid: raw data (left side) and homogenized data (right side; all series with distance weighted least squares fits). Comparison of some Results, Based on Raw and Homogenized Data. An estimation of the error margins of raw and homogenized data The instrumental error in temperature measurement was of the order of 0.1 C (Linacre, 1992; Servicio Meteorológico Nacional, 1956) and increased to around 0.2 C when differences between two series are concerned (assuming linear error propagation). Any linear homogeneity adjustment based on the same data type added an error of the same amplitude. A longterm series of roughly one century required an average of two adjustments and the mean margin of this error (instrumental plus homogeneization) increased to C, an amplitude of the order of the mean global warming of the 20th century (0.6 C, IPCC, 2001). This comparison illustrates the crucial role of data quality. On the other hand, the mean amplitude of the adjustments (around 1 C) defined the mean error of the inhomogeneities and the uncertainty in the rawdata series, besides the instrumental error of 0.1 C. A large series had between one and five inhomogeneous breaks, with a statistical average of around two. The errors tended to cancel each other or accumulate (partially or entirely). In the latter case, the total error (instrumental plus inhomogeneities) could exceed 1 C, or sometimes even be higher than 2 C. This hampered the detection of climate changes on any reasonable confidence level. The critical role of data homogeneity was confirmed in an extensive analysis of 20th century surfaceair temperature and precipitation data from the European Climate Assessment, in Wijngaard et al. (2003). The authors organized the quality of the tested series into the classes useful, doubtful, and suspect. In the period ( ), 94% (61%) of the temperature series are labelled doubtful or suspect. Referring to trends and the variability of weather extremes, the authors state that Clearly, this type of analysis is limited by the degree of inhomogeneity of the data. To compare these statements with the data of the present study, the following paragraph summarizes some comparisons between temperature changes in raw and adjusted data. A comparison of some results, based on raw and homogenized data To compare the thermal changes detected with raw and homogenized data, we applied a ttest
12 M. STAUDT., M. J. ESTEBANPARRA AND Y. CASTRODÍEZ Table III. Details of the adjustments of the maximum and the minimum temperatures in Madrid. The iteration steps 1 and 3 are there because of the other series, although nothing was done in the Madrid series. The adjusted values are added to all data before the break. The symbol s/n is a signal to noise ratio : the quotient of the adjusted value and the standard deviation of the base interval. Maximum temperatures Minimum temperatures A. Adjustment of inhomogeneous breaks and rejection of inhomogeneous data Iteration step 1 Iteration step 2 break: , adjusted with data of Burgos, anomaly October 1894 rejected. Salamanca, Segovia, Soria and Albacete, base interval: , value: 1.81 C, s/n = break: , adjusted with data of Salamanca, Segovia, Soria, Toledo, Ciudad Real and Cuenca, base interval: , value: C, s/n = 0.45 Iteration step 3 Iteration step 4 break: Nov March 1937, adjusted with data of Burgos, Palencia, Salamanca, Segovia, Toledo, Cuenca; interval: Jan Dec. 1951, value: 0.65 C, s/n = anomaly August 1993 rejected. break: , adjusted with data of Burgos, Salamanca, Segovia, Soria and Albacete, base interval: , value: C, s/n = break: , adjusted with data of Burgos, Salamanca and Albacete, base interval: , value: 0.71 C, s/n = B. Filling of data gaps with data from 4 11 series (of the central plains) and base intervals of approximately 20 years around the missing data. Feb. and Dec. 1875, July 1878, July 1879, July Dec. 1875, July 1897, Oct. 1894, Sept. 1928, 1897, June 1905, June 1922, Nov. Dec. 1936, Nov. Dec. 1936, Jan Feb Apr Oct. 1937, Jan Feb April Oct. 1937, March and April MarApr 1939, Aug , Aug (autocorrelationcorrected) to the temperature means of the first and last 30year intervals of the 20th century, in six examples (Table IV). After homogenization, the regional net temperature changes were substantially more similar and more consistent between the local and the mean representative series. In several cases, even the qualitative results and their significance levels differed: in the maximum temperatures of the Cantabrian and the Ebro valley and the minima of the Mediterranean, there was a lack of consistency between the raw local and mean series, where only one series showed a highly significant change. In all cases, the degree of consistency in the homogenized data was at least similar, but was usually higher. These results confirmed the substantially larger errors of the raw series and suggested that an analysis based on the raw data in many cases may not be valid if a reasonable confidence level is requested. Furthermore, according to An estimation of the error margins of raw and homogenized data, the homogenization procedure improves the data quality by reducing the error margins (seeeliminating the error of the order of 1 C, due to the inhomogeneities) and is strongly recommended as a previous step, before analysing the data. The empirical urban correction and the approach with the data of Madrid and Toledo To test the performance of both urban corrections for Madrid, the differences between the average series of central Spain (without Madrid) and Madrid were compared (Figure 6). The average series stemmed from mediumsized towns with an average population of around , for which the urban effect was negligible, compared to Madrid. The decreasing trend of the differences without any urban correction was, at least partly, owing to the urban effect (the climatic differences in central Spain were not large). With the empirical correction, this trend was reversed, signifying overadjustment of the urban effect. The series C, with the alternative adjustment (Madrid Toledo) still showed an increase, but clearly weaker and less significant than B, indicating a more realistic correction, although slightly too great. The urban effect in Madrid was smaller than it would be theoretically, following the population data and the comparison with Toledo. The Madrid data were compiled from the Retiro park observatory, located in the urban centre, but close to the edge of this green area of about 1.2 km 2. The minima at dawn were very probably lowered, thus attenuating the urban effect. García Hernández et al. (1997) stated a clear influence of
PRESENTATION OF E.164 NATIONAL NUMBERING PLAN COUNTRY CODE 34 SPAIN SHORT CODES. 0 3 3 Short codes Social value services
PRESENTATION OF E.164 NATIONAL NUMBERING PLAN COUNTRY CODE 34 SPAIN (Updated 01102013) N(S)N number 00 2 2 International prefix SHORT CODES 0 3 3 Short codes Social value services 1 4 4 Short codes 103
More informationLIFE08 ENV/IT/436 Time Series Analysis and Current Climate Trends Estimates Dr. Guido Fioravanti guido.fioravanti@isprambiente.it Rome July 2010 ISPRA Institute for Environmental Protection and Research
More informationTrends of Extreme Precipitation over the Yangtze River Basin of China in 1960J2004
Advances in Climate Change Research Letters Article ID: 16731719 (7) Suppl.456 Trends of Extreme Precipitation over the Yangtze River Basin of China in 196J4 Su Buda 1,, Jiang Tong 1, Ren Guoyu, Chen
More informationTrends in frequency indices of daily precipitation over the Iberian Peninsula during the last century
JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 116,, doi:10.1029/2010jd014255, 2011 Trends in frequency indices of daily precipitation over the Iberian Peninsula during the last century M. C. Gallego, 1 R. M. Trigo,
More informationGuidelines on Quality Control Procedures for Data from Automatic Weather Stations
WORLD METEOROLOGICAL ORGANIZATION COMMISSION FOR BASIC SYSTEMS OPEN PROGRAMME AREA GROUP ON INTEGRATED OBSERVING SYSTEMS EXPERT TEAM ON REQUIREMENTS FOR DATA FROM AUTOMATIC WEATHER STATIONS Third Session
More informationThe Effects of Climate Change on Water Resources in Spain
Marqués de Leganés 1228004 Madrid Tel: 915312739 Fax: 915312611 secretaria@ecologistasenaccion.org www.ecologistasenaccion.org The Effects of Climate Change on Water Resources in Spain In order to achieve
More informationHigh Speed Rail in Spain. Victorino Pérez Senior Manager International Relations Renfe Operadora FEBRUARY 25 th, 2014
1 High Speed Rail in Spain Victorino Pérez Senior Manager International Relations Renfe Operadora FEBRUARY 25 th, 2014 2 Spanish Railway Network Total railway network: 15,333 km HS network standard gauge
More informationOBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments
More informationTime Series Analysis
Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina GarcíaMartos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and GarcíaMartos
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationANALYSIS OF TOURISM POTENTIAL FOR CRETE
ANALYSIS OF TOURISM POTENTIAL FOR CRETE A. Matzarakis 1, P. Nastos 2 N. Karatarakis 3, A. Sarantopoulos 3 1 Meteorological Institute, University of Freiburg, Germany 2 Laboratory of Climatology and Atmospheric
More informationInterpolations of missing monthly mean temperatures in the Karasjok series
Interpolations of missing monthly mean temperatures in the Karasjok series Øyvind ordli (P.O. Box 43, 0313 OSLO, ORWAY) ABSTRACT Due to the HistKlim project the sub daily data series from Karasjok was
More informationWORKSHOP REGULATING ACCESS TO PROFESSIONS: NATIONAL PERSPECTIVES. Breakout session 3: Social workers
WORKSHOP REGULATING ACCESS TO PROFESSIONS: NATIONAL PERSPECTIVES Breakout session 3: Social workers Ana Isabel Lima Fernández President General Council of Social Work, Spain Brussels,17June 2013 General
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationAIR TEMPERATURE IN THE CANADIAN ARCTIC IN THE MID NINETEENTH CENTURY BASED ON DATA FROM EXPEDITIONS
PRACE GEOGRAFICZNE, zeszyt 107 Instytut Geografii UJ Kraków 2000 Rajmund Przybylak AIR TEMPERATURE IN THE CANADIAN ARCTIC IN THE MID NINETEENTH CENTURY BASED ON DATA FROM EXPEDITIONS Abstract: The paper
More informationPresentation of data
2 Presentation of data Using various types of graph and chart to illustrate data visually In this chapter we are going to investigate some basic elements of data presentation. We shall look at ways in
More informationGlobal Seasonal Phase Lag between Solar Heating and Surface Temperature
Global Seasonal Phase Lag between Solar Heating and Surface Temperature Summer REU Program Professor Tom Witten By Abstract There is a seasonal phase lag between solar heating from the sun and the surface
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationInteractive comment on Total cloud cover from satellite observations and climate models by P. Probst et al.
Interactive comment on Total cloud cover from satellite observations and climate models by P. Probst et al. Anonymous Referee #1 (Received and published: 20 October 2010) The paper compares CMIP3 model
More informationMethodology For Illinois Electric Customers and Sales Forecasts: 20162025
Methodology For Illinois Electric Customers and Sales Forecasts: 20162025 In December 2014, an electric rate case was finalized in MEC s Illinois service territory. As a result of the implementation of
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationInstitute of Economic Analysis. How warming is made. The case of Russia. Author: N.A. Pivovarova Editor: A.N. Illarionov
Institute of Economic Analysis How warming is made. The case of Russia. Author: N.A. Pivovarova Editor: A.N. Illarionov Moscow December 2009 1 Contents 1. Introduction 2. Original data 3. Criteria for
More informationClimate Extremes Research: Recent Findings and New Direc8ons
Climate Extremes Research: Recent Findings and New Direc8ons Kenneth Kunkel NOAA Cooperative Institute for Climate and Satellites North Carolina State University and National Climatic Data Center h#p://assessment.globalchange.gov
More informationAn Alternative Route to Performance Hypothesis Testing
EDHECRisk Institute 393400 promenade des Anglais 06202 Nice Cedex 3 Tel.: +33 (0)4 93 18 32 53 Email: research@edhecrisk.com Web: www.edhecrisk.com An Alternative Route to Performance Hypothesis Testing
More informationForecaster comments to the ORTECH Report
Forecaster comments to the ORTECH Report The Alberta Forecasting Pilot Project was truly a pioneering and landmark effort in the assessment of wind power production forecast performance in North America.
More informationAnalysis of Turkish precipitation data: homogeneity and the Southern Oscillation forcings on frequency distributions
HYDROLOGICAL PROCESSES Hydrol. Process. 21, 3203 3210 (2007) Published online 7 March 2007 in Wiley InterScience (www.interscience.wiley.com).6524 Analysis of Turkish precipitation data: homogeneity and
More informationJames Hansen, Reto Ruedy, Makiko Sato, Ken Lo
If It s That Warm, How Come It s So Damned Cold? James Hansen, Reto Ruedy, Makiko Sato, Ken Lo The past year, 2009, tied as the second warmest year in the 130 years of global instrumental temperature records,
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationFILTER CIRCUITS. A filter is a circuit whose transfer function, that is the ratio of its output to its input, depends upon frequency.
FILTER CIRCUITS Introduction Circuits with a response that depends upon the frequency of the input voltage are known as filters. Filter circuits can be used to perform a number of important functions in
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationTemporal variation in snow cover over sea ice in Antarctica using AMSRE data product
Temporal variation in snow cover over sea ice in Antarctica using AMSRE data product Michael J. Lewis Ph.D. Student, Department of Earth and Environmental Science University of Texas at San Antonio ABSTRACT
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationIndustry Environment and Concepts for Forecasting 1
Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6
More informationClimate Data and Information: Issues and Uncertainty
Climate Data and Information: Issues and Uncertainty David Easterling NOAA/NESDIS/National Climatic Data Center Asheville, North Carolina, U.S.A. 1 Discussion Topics Climate Data sets What do we have besides
More informationTraffic Indices for the use of the Belgian motorway network
TRANSPORT & MOBILITY LEUVEN TERVUURSEVEST 54 BUS 4 3000 LEUVEN BELGIUM http://www.tmleuven.be TEL +32 16 22.95.52 FAX +32 16 20.42.22 WORKING PAPER NR. 200301 Traffic Indices for the use of the Belgian
More informationEmployment, unemployment and real economic growth. Ivan Kitov Institute for the Dynamics of the Geopsheres, Russian Academy of Sciences
Employment, unemployment and real economic growth Ivan Kitov Institute for the Dynamics of the Geopsheres, Russian Academy of Sciences Oleg Kitov Department of Economics, University of Oxford Abstract
More informationClimate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:
OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your
More informationRobichaud K., and Gordon, M. 1
Robichaud K., and Gordon, M. 1 AN ASSESSMENT OF DATA COLLECTION TECHNIQUES FOR HIGHWAY AGENCIES Karen Robichaud, M.Sc.Eng, P.Eng Research Associate University of New Brunswick Fredericton, NB, Canada,
More informationShortTerm Forecasting in Retail Energy Markets
Itron White Paper Energy Forecasting ShortTerm Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting
More informationRandom Portfolios for Evaluating Trading Strategies
Random Portfolios for Evaluating Trading Strategies Patrick Burns 13th January 2006 Abstract Random portfolios can provide a statistical test that a trading strategy performs better than chance. Each run
More informationWhat Does the Correlation Coefficient Really Tell Us About the Individual?
What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment
More informationREFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationAppendix 1: Time series analysis of peakrate years and synchrony testing.
Appendix 1: Time series analysis of peakrate years and synchrony testing. Overview The raw data are accessible at Figshare ( Time series of global resources, DOI 10.6084/m9.figshare.929619), sources are
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationEl NiñoSouthern Oscillation (ENSO) since A.D. 1525; evidence from treering, coral and ice core records.
El NiñoSouthern Oscillation (ENSO) since A.D. 1525; evidence from treering, coral and ice core records. Karl Braganza 1 and Joëlle Gergis 2, 1 Climate Monitoring and Analysis Section, National Climate
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationIntegrated Resource Plan
Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 6509629670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1
More informationEconomically Active Population Survey (EAPS) Fourth quarter of 2014
Main results Economically Active Population Survey (EAPS) Fourth quarter of 2014 22 January 2014 Employment registers an increase of 65,100 persons in this quarter as compared with the previous quarter,
More information163 ANALYSIS OF THE URBAN HEAT ISLAND EFFECT COMPARISON OF GROUNDBASED AND REMOTELY SENSED TEMPERATURE OBSERVATIONS
ANALYSIS OF THE URBAN HEAT ISLAND EFFECT COMPARISON OF GROUNDBASED AND REMOTELY SENSED TEMPERATURE OBSERVATIONS Rita Pongrácz *, Judit Bartholy, Enikő Lelovics, Zsuzsanna Dezső Eötvös Loránd University,
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationCLOUD COVER IMPACT ON PHOTOVOLTAIC POWER PRODUCTION IN SOUTH AFRICA
CLOUD COVER IMPACT ON PHOTOVOLTAIC POWER PRODUCTION IN SOUTH AFRICA Marcel Suri 1, Tomas Cebecauer 1, Artur Skoczek 1, Ronald Marais 2, Crescent Mushwana 2, Josh Reinecke 3 and Riaan Meyer 4 1 GeoModel
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationPozuelo de Alarcón is the city with the highest level of income and lowest unemployment rate of the 109 analyzed
30 June 2015 Urban Indicators (Urban Audit) Year 2015 Pozuelo de Alarcón is the city with the highest level of income and lowest unemployment rate of the 109 analyzed Sanlúcar de Barrameda has the highest
More informationEXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY
EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due
More informationThe Risk Driver Approach to Project Schedule Risk Analysis
The Risk Driver Approach to Project Schedule Risk Analysis A Webinar presented by David T. Hulett, Ph.D. Hulett & Associates, LLC To the College of Performance Management April 18, 2013 2013 Hulett & Associates,
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationhttp://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions
A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.
More informationDo Commodity Price Spikes Cause LongTerm Inflation?
No. 111 Do Commodity Price Spikes Cause LongTerm Inflation? Geoffrey M.B. Tootell Abstract: This public policy brief examines the relationship between trend inflation and commodity price increases and
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationTime Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina GarcíaMartos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and GarcíaMartos (UC3MUPM)
More informationAustralian Temperature Variations  An Alternative View
Australian Temperature Variations  An Alternative View John McLean 11 October 2007 Appendix by Dr. Thomas Quirk added 17 October 2007 Australia's temperature since 1950 (see Figure 1) is usually described
More informationAP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
More informationAdvanced Forecasting Techniques and Models: ARIMA
Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: admin@realoptionsvaluation.com
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationDeghosting by kurtosis maximisation in practice Sergio Grion*, Rob Telling and Janet Barnes, Dolphin Geophysical
in practice Sergio Grion*, Rob Telling and Janet Barnes, Dolphin Geophysical Summary Adaptive deghosting estimates the parameters of the physical process determining the ghost reflection, which are not
More informationCritical Limitations of Wind Turbine Power Curve Warranties
Critical Limitations of Wind Turbine Power Curve Warranties A. Albers Deutsche WindGuard Consulting GmbH, Oldenburger Straße 65, D26316 Varel, Germany Email: a.albers@windguard.de, Tel: (++49) (0)4451/951515,
More informationANNUAL QUALITY REPORT
REPUBLIC OF SLOVENIA ANNUAL QUALITY REPORT FOR THE CONSUMER OPINION SURVEY FOR 2012 Prepared by: Martin Bajželj, Marta Arnež Date: October 2013 1/12 Table of contents 0 Basic Data... 3 1 Relevance... 5
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationThe Fundación Secretariado Gitano
The Fundación Secretariado Gitano Mission, values and aims The Fundación Secretariado Gitano is a nonprofit intercultural social organisation which provides services for the development of the Roma community
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationReport of for Chapter 2 pretest
Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every
More informationPromotional Forecast Demonstration
Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion
More informationAPPLICATION OF DISCRETE EVENT SIMULATION FOR VALIDATING A STORAGE INFRASTRUCTURE
APPLICATION OF DISCRETE EVENT SIMULATION FOR VALIDATING A STORAGE INFRASTRUCTURE CASE STUDY: SIMULATION OF A FACILITY TO SUPPLY AN AIRPORT StocExpo 2015 CLH is renown for its efficient and effective transportation
More informationProbabilistic Forecasting of MediumTerm Electricity Demand: A Comparison of Time Series Models
Fakultät IV Department Mathematik Probabilistic of MediumTerm Electricity Demand: A Comparison of Time Series Kevin Berk and Alfred Müller SPA 2015, Oxford July 2015 Load forecasting Probabilistic forecasting
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationSTATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF
STATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF DIFFERENCES DUE TO SEX OR VISIBLE MINORITY STATUS. Oxana Marmer and Walter Sudmant, UBC Planning and Institutional Research SUMMARY This paper
More informationTesting for serial correlation in linear paneldata models
The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear paneldata models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear paneldata
More informationPrediction and Confidence Intervals in Regression
Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SHDH. Hours are detailed in the syllabus.
More informationRegression analysis in practice with GRETL
Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that
More informationData Processing Flow Chart
Legend Start V1 V2 V3 Completed Version 2 Completion date Data Processing Flow Chart Data: Download a) AVHRR: 19811999 b) MODIS:20002010 c) SPOT : 19982002 No Progressing Started Did not start 03/12/12
More informationTime Series Analysis
Time Series Analysis Time series and stochastic processes Andrés M. Alonso Carolina GarcíaMartos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and GarcíaMartos
More informationClimatography of the United States No. 20 19712000
Climate Division: CA 6 NWS Call Sign: SAN Month (1) Min (2) Month(1) Extremes Lowest (2) Temperature ( F) Lowest Month(1) Degree s (1) Base Temp 65 Heating Cooling 100 Number of s (3) Jan 65.8 49.7 57.8
More informationTHE STATISTICAL TREATMENT OF EXPERIMENTAL DATA 1
THE STATISTICAL TREATMET OF EXPERIMETAL DATA Introduction The subject of statistical data analysis is regarded as crucial by most scientists, since errorfree measurement is impossible in virtually all
More informationAge to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA
Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected
More informationA Multiplicative Seasonal BoxJenkins Model to Nigerian Stock Prices
A Multiplicative Seasonal BoxJenkins Model to Nigerian Stock Prices Ette Harrison Etuk Department of Mathematics/Computer Science, Rivers State University of Science and Technology, Nigeria Email: ettetuk@yahoo.com
More informationAn Assessment of Prices of Natural Gas Futures Contracts As A Predictor of Realized Spot Prices at the Henry Hub
An Assessment of Prices of Natural Gas Futures Contracts As A Predictor of Realized Spot Prices at the Henry Hub This article compares realized Henry Hub spot market prices for natural gas during the three
More informationResidential Market Report
Residential Market Report Madrid City and the Metropolitan Area of Madrid A g u i r r e N e w m a n June 2014 A G E N D A 01 02 03 04 INTRODUCTION AND METHODOLOGY GEOGRAPHIC DISTRIBUTION CONCLUSIONS OF
More informationExamining the Recent Pause in Global Warming
Examining the Recent Pause in Global Warming Global surface temperatures have warmed more slowly over the past decade than previously expected. The media has seized this warming pause in recent weeks,
More informationAdvanced timeseries analysis
UCL DEPARTMENT OF SECURITY AND CRIME SCIENCE Advanced timeseries analysis Lisa Tompson Research Associate UCL Jill Dando Institute of Crime Science l.tompson@ucl.ac.uk Overview Fundamental principles
More informationPlate Tectonics GIS Activities
Plate Tectonics GIS Activities Introduction A Geographic Information System (GIS) is a system designed to capture, store, manipulate, analyse, manage, and present all types of geographical data. In the
More informationTHE DEVELOPMENT OF A NEW DATASET OF SPANISH DAILY ADJUSTED TEMPERATURE SERIES (SDATS) (1850 2003)
INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 26: 1777 1802 (2006) Published online 5 May 2006 in Wiley InterScience (www.interscience.wiley.com).1338 THE DEVELOPMENT OF A NEW DATASET OF SPANISH
More informationContent Sheet 71: Overview of Quality Control for Quantitative Tests
Content Sheet 71: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management
More informationHOW WILL CLIMATE CHANGE AFFECT THE COMPETITIVENESS OF EUROPEAN BEACH AND SKI TOURISM?
HOW WILL CLIMATE CHANGE AFFECT THE COMPETITIVENESS OF EUROPEAN BEACH AND SKI TOURISM? Assessing the impact of adaptation on the number of overnight stays Executive summary With climate change, mean temperatures
More informationDevelopment of new hybrid geoid model for Japan, GSIGEO2011. Basara MIYAHARA, Tokuro KODAMA, Yuki KUROISHI
Development of new hybrid geoid model for Japan, GSIGEO2011 11 Development of new hybrid geoid model for Japan, GSIGEO2011 Basara MIYAHARA, Tokuro KODAMA, Yuki KUROISHI (Published online: 26 December 2014)
More informationPhysics Lab Report Guidelines
Physics Lab Report Guidelines Summary The following is an outline of the requirements for a physics lab report. A. Experimental Description 1. Provide a statement of the physical theory or principle observed
More informationStatistical matching: Experimental results and future research questions
Statistical matching: Experimental results and future research questions 2015 19 Ton de Waal Content 1. Introduction 4 2. Methods for statistical matching 5 2.1 Introduction to statistical matching 5 2.2
More informationPredicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University  CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
More information