Probabilistic Forecasts of Wind Speed: Ensemble Model Output Statistics using Heteroskedastic Censored Regression

Transcription

1 Probabilistic Forecasts of Wind Speed: Ensemble Model Output Statistics using Heteroskedastic Censored Regression Thordis L. Thorarinsdottir 1,2 and Tilmann Gneiting 1 1 University of Washington, Seattle, Washington, USA 2 University of Aarhus, Aarhus, Denmark Technical Report no. 546 Department of Statistics, University of Washington December 2008 Abstract As wind energy penetration continues to grow, there is a critical need for probabilistic forecasts of wind resources. In addition, there are many other societally relevant uses for forecasts of wind speed, ranging from aviation to ship routing and recreational boating. Over the past two decades, ensembles of numerical weather prediction (NWP) models have been developed, in which multiple estimates of the current state of the atmosphere are used to generate a collection of deterministic forecasts. However, even state-of-the-art ensemble systems are uncalibrated and biased. Here we propose a novel way of statistically post-processing NWP ensembles for wind speed using heteroskedastic censored (Tobit) regression, where location and spread derive from the ensemble forecast. The resulting ensemble model output statistics (EMOS) method is applied to 48-hour ahead forecasts of maximum wind speed over the North American Pacific Northwest in 2003 using the University of Washington Mesoscale Ensemble. The statistically post-processed EMOS density forecasts turn out to be calibrated and sharp, and result in substantial improvement over the unprocessed NWP ensemble or climatological reference forecasts. Key words and phrases: Continuous ranked probability score; Density forecast; Ensemble system; Numerical weather prediction; Heteroskedastic censored regression; Tobit model; Wind energy. 1

2 1 Introduction Accurate forecasts of wind speed are of critical importance in many applications of societal relevance, ranging from severe weather warnings for the general public to risk assessment and decision making in aviation, ship routing, recreational boating, and agriculture. Wind storms often lead to power failures and can cause extensive damage as well as threat to human life. For example, during the December 14-15, 2006 Hanukkah Eve Wind Storm over the Pacific Northwest including the northwestern United States and southeastern Canada more than 1.3 million customers lost power for up to a week, at least 13 individuals lost their lives, and estimates of damage reached a billion dollars (Mass 2008). Figure 1 shows a scene at the second author s home in the aftermath of the storm. Principled risk management depends on probabilistic forecasts, that take the form of predictive probability distributions for future quantities or events (National Research Council 2006; Gneiting 2008a). Farmers might be interested in the chance of the wind being calm enough for them to spray pesticides, while recreational boaters may want to know how likely it is that the wind speed will be substantial. Arguably, the most pronounced need for probabilistic forecasts of wind resources stems from the global proliferation of wind energy, whose installed capacity increased by 27% in 2007 alone (Global Wind Energy Council 2008). Wind power provides an attractive, emissions free alternative to fossil fuels. However, it is an intermittent source of energy, and its continued spread hinges on the ability to reliably predict wind speed (Genton and Hering 2007). From an economic perspective, underand over-prediction of wind power result in heavy financial penalties in deregularised energy markets. The optimal point forecast depends on current, rapidly changing market features and typically is a quantile of the predictive distribution (Roulston, Kaplan, Hardenberg, and Smith 2003; Pinson, Chevallier, and Kariniotakis 2007). More generally, access to the full predictive distribution provides users with the ability to tailor a point forecast or decision to the loss structure at hand (Diebold, Gunther, and Tay 1998; Gneiting 2008b). Here we are concerned with probabilistic forecasts of surface wind speed at a prediction horizon of 48 hours. Short-range weather forecasts at prediction horizons of only few hours are typically done by purely statistical approaches, using time series models or neural networks (Brown, Katz, and Murphy 1984; Kretzschmar, Eckert, Cattani, and Eggimann 2004; Gneiting, Larson, Westrick, Genton, and Aldrich 2006; Costa, Crespo, Navarro, Lizcano, Madsen, and Feitosa 2008). In the medium-range, at prediction horizons of one to ten days, forecasts based on numerical weather prediction (NWP) models outperform purely statistical forecasts (Campbell and Diebold 2005). However, NWP forecasts are deterministic and do not account for the uncertainty that arises from incomplete initial estimates of atmospheric conditions, or imperfections and discretisation in the numerical model. To take these sources of uncertainty into account, a commonly used approach is to employ an ensemble of NWP forecasts, where the ensemble members differ from each other by the initial conditions and/or the numerical model being used (Palmer 2002; Gneiting and Raftery 2005). For almost all operational ensemble systems, a positive association between the forecast error 2

3 Figure 1: Tree damage to a home on View Ridge in Seattle, Washington after the Hanukkah Eve Wind Storm of and the spread in the forecast ensemble has been established. However, even state-of-the-art ensembles are uncalibrated and subject to systematic bias, in addition to being limited by the size of the ensemble, which typically comprises five to 50 members. In view of these limitations, ensemble forecasts call for some form of statistical post-processing before a predictive distribution is passed on to the user. For wind speed and wind power, the most common approach is to use quantile regression (Bremnes 2004; Nielsen, Madsen, and Nielsen 2006; Møller, Nielsen, and Madsen 2008). This method yields quantile and interval forecasts, but not a full predictive distribution. Here we propose a simple post-processing technique that uses heteroskedastic censored regression to obtain predictive distributions. This builds on the ensemble model output statistics (EMOS) approach of Gneiting, Raftery, Westveld, and Goldman (2005), who employ Gaussian predictive distributions for surface temperature, with a mean that is linear in the ensemble member forecasts, and a variance that is an affine function of the ensemble variance. This approach is simple and powerful (Wilks 2006a; Wilks and Hamill 2007), but does not apply directly to non-negative weather quantities, such as wind speed. To address the non-negativity of the predictand, we adapt the EMOS technique and employ truncated normal predictive distributions with a cut-off at zero. This is akin to the heteroskedastic censored (Tobit) regression model (Tobin 1958; Chen and Khan 2000) and yields predictive distributions that condition on the NWP ensemble, while correcting for biases and dispersion errors. To give an example, Figure 2 shows a 48-hour ahead EMOS density forecast of maximum wind speed valid June 14, 2003 at The Dalles, Oregon in the North American Pacific Northwest, using the eight-member University of Washington Mesoscale Ensemble (Eckel and Mass 3

4 Wind Speed in Knots Figure 2: 48-hour ahead Local EMOS density forecast of maximum wind speed valid June 14, 2003 at The Dalles, Oregon. The black lines represent the eight members of the University of Washington Mesoscale Ensemble. The red lines border the 77.8% central prediction interval for the EMOS density forecast, which is shown in grey. The broken red line represents the EMOS median forecast, and the blue line the verifying observation, at 18 knots. 2005). The EMOS density forecast, which is estimated on a 40 day rolling training period, corrects for the low bias and under-dispersion in the NWP ensemble. The remainder of the paper is organized as follows. Section 2 introduces the University of Washington Mesoscale Ensemble (UWME) and describes the EMOS post-processing technique. In Section 3 we apply the EMOS technique to create daily 48-hour ahead forecasts of surface wind speed over the Pacific Northwest in the calendar year 2003, based on the UWME system. In these experiments, the EMOS density forecasts turn out to be calibrated and sharp and compare favorably to reference forecasts. The paper closes with a discussion of potential extensions and future challenges in Section 4. 2 Data and methods 2.1 Forecast and observation data We consider 48-hour ahead forecasts of maximum wind speed in the Pacific Northwest in the period from 1 November 2002 through 31 December 2003, using the eight-member University of Washington Mesoscale Ensemble (UWME) system (Eckel and Mass 2005). The ensemble member forecasts rely on initial conditions supplied by eight different operational forecast centers that drive the fifth-generation Pennsylvania State University-National Center for Atmospheric Research (PSU-NCAR) Mesoscale Model (MM5) (Grell, Dudhia, and Stauffer 1995). The MM5 model runs on a 12 kilometer grid over the Pacific Northwest which, in general, does not match the observation locations. Forecasts at observation sites are thus created by bi-linear interpolation from the model grid, as is common practice in the 4

5 BC AB WA ID OR CA NV Figure 3: Surface airway observation (SAO) stations at airports in the Pacific Northwest, including the Canadian provinces of British Columbia (BC) and Alberta (AB), and the US states of Washington (WA), Oregon (OR), Idaho (ID), California (CA) and Nevada (NV). The arrows indicate the View Ridge area in Seattle, Washington (see Figure 1) and the city of The Dalles, Oregon (see Figure 10). meteorological community. More sophisticated interpolation schemes are unlikely to do any better and do not justify the extra computational effort (Shao, Stein, and Ching 2007; Jun, Knutti, and Nychka 2008). Our data base contains the eight ensemble member forecasts and verifying observations of maximum wind speed at 107 surface airway observation (SAO) stations in the United States and Canada, as illustrated in Figure 3. Maximum wind speed is defined as the maximum of the hourly instantaneous wind speed 10 meters above ground over the previous eighteen hours, where an hourly instantaneous wind speed is a 2-minute average from the period of two minutes before the hour to on the hour. Wind speed observations are rounded to the nearest whole knot when recorded, except that wind speeds below one knot are recorded as zero. One knot is equal to approximately meters per second, or miles per hour. For the calendar year 2003, data are available for 291 days, for a total of 29,542 individual forecast cases at 107 meteorological stations in the Pacific Northwest. Only 43 of these observations are at one knot or lower. The cases from 2002 were used for training purposes only. All data were subject to the quality control procedures described by Baars (2005). 5

6 2.2 Ensemble model output statistics using heteroskedastic censored regression Our goal here is to adapt the ensemble model output statistics (EMOS) post-processing approach of (Gneiting et al. 2005) to non-negative weather variables, such as wind speed. The name stems from the term model output statistics (MOS), which is used by atmospheric scientists to refer to regression approaches that use output from NWP models as predictor variables (Glahn and Lowry 1972; Wilks 2006b). Traditionally, MOS techniques have been used for point or probability of precipitation forecasts from a single NWP model, or for point forecasts from ensembles of seasonal weather or climate models (Krishnamurti et al. 1999; Kharin and Zwiers 2002). Specifically, let X 1,..., X k denote an ensemble of individually distinguishable forecasts for a uni-variate continuous quantity Y that takes values on R +, the non-negative real axis. Here we think of Y as wind speed, but the method applies more generally. To address the non-negativity of the predictand, we follow Gneiting et al. (2006) and employ a truncated normal predictive distribution with a cutoff at zero, namely N 0 (µ, σ 2 ) where µ = a + b 1 X b k X k and σ 2 = c + ds 2. (1) The location parameter, µ, is a linear function of the ensemble member forecasts, and the spread parameter, σ 2, is an affine function of the ensemble variance, S 2 = 1 k k i=1 (X i X) 2, where X = 1 k k i=1 X i. The EMOS predictive density for the future weather quantity Y thus becomes f(y) = [ 1 σ ϕ ( y µ σ )]/ ( µ ) Φ σ for y > 0 and f(y) = 0 otherwise, where ϕ and Φ denote the standard normal density function and standard normal cumulative distribution function, respectively. To ensure that (1) specifies a valid probability distribution, the spread parameters c and d need to be non-negative. We furthermore constrain the regression coefficients b 1,..., b k to be nonnegative; this does not deteriorate predictive performance, while enhancing interpretability and stabilizing the estimates (Gneiting et al. 2005). To include these constraints into the EMOS model we write b 1 = β 2 1,..., b k = β 2 k, c = γ 2, d = δ 2, (2) where β 1,..., β k, γ and δ are unconstrained real parameters. Thus, the EMOS density forecast is the fitted truncated normal distribution (1) with the constraints in (2). The model parameters allow for direct interpretation, with the intercept a being a bias correction term, and the regression coefficients b 1,..., b k reflecting the overall contributions of the ensemble members to the predictive skill over a training period. The variance parameters c and d can be interpreted in terms of the relationship between ensemble spread and forecast skill (Whitaker and Loughe 1998). All else being equal, larger values of the parameter d indicate 6

7 a more pronounced spread-skill correlation. If spread and forecast skill are independent of each other, the parameter d will be negligibly small. More general and more flexible parameterisations of the variance term in (1) are feasible, but seem unlikely to result in improved predictive performance. The EMOS approach fits the general framework of heteroskedastic regression (Leslie, Kohn, and Nott 2007) and can be interpreted as heteroskedastic censored (Tobit) regression (Tobin 1958; Chib 1992; Chen and Khan 2000). Previous uses of truncated normal distributions for weather variables include applications to quantitative precipitation (Sansò and Guenni 1999; Allcroft and Glasbey 2003) and wind speed (Gneiting et al. 2006). 2.3 Estimation The goal in probabilistic forecasting is to maximize the sharpness of the predictive distributions subject to calibration (Gneiting, Balabdaoui, and Raftery 2007; Pal 2009). Calibration refers to the statistical consistency between the predictive distributions and the observations. This goal should thus be reflected in the choice of the optimization method for the parameter estimation. One way to obtain this goal is to estimate the parameters by optimizing a proper scoring rule as a function of the parameter values (here, real parameters a, β 1,..., β k, γ and δ) on training data. Gneiting et al. (2005) and Gneiting and Raftery (2007) refer to this general approach as optimum score estimation. The propriety of the scoring rule ensures that both calibration and sharpness are addressed. The most popular scoring rules for density forecasts are the logarithmic score (Good 1952) and the continuous ranked probability score (Matheson and Winkler 1976; Gneiting and Raftery 2007). Both rules are proper and negatively oriented, that is, the smaller the better. The logarithmic score is simply the negative of the logarithm of the predictive density evaluated at the observation. Thus, optimum score estimation based on the logarithmic score is simply maximum likelihood estimation. The continuous ranked probability score is defined as ( ) 2 crps(f, y) = F (x) 1{x y} dx, where F is the predictive cumulative distribution function and y is the verifying observation. For a truncated normal predictive distribution and an observation y 0 we get crps ( N 0 (µ, σ 2 ), y ) ( µ ) 2 [ y µ ( µ ) { ( y µ ) ( µ ) } = σ Φ Φ 2 Φ + Φ 2 σ σ σ σ σ ( y µ ) ( µ ) + 2 ϕ Φ 1 ( µ )] Φ 2. σ σ π σ For a point forecast or Dirac measure, the continuous ranked probability score reduces to the absolute error (Grimit, Gneiting, Berrocal, and Johnson 2006). Following Gneiting et al. (2005), we employ optimum score estimation based on the continuous ranked probability score, which is a more robust choice than maximum likelihood 7

8 estimation (Gneiting and Raftery 2007). In other words, we find the values of a, β 1,..., β k, γ and δ that minimize 1 n crps ( ) N 0 (a + β 2 n 1X j1 + + βkx 2 jk, γ 2 + δ 2 Si 2 ), Y j, j=1 where the sum extends over the forecast cases in the training set. The method relies on numerical optimization, which is done with the Broyden-Fletcher-Goldfarb-Shanno algorithm (Bertsekas 1999, Section 1.7) as implemented in R ( A critical question remains, namely that of an appropriate choice of the training set. This will be addressed in the following section. 2.4 Choice of training data In real-time forecasting, a popular choice for the training set is that of a rolling training period. Hence, on any given day, we use training data from the most recent m days available. Two decisions are to be made here. One is the choice of the length m of the rolling training period. Clearly, there is a trade-off in doing this. Shorter training periods adapt rapidly to seasonally varying model biases, changes in the performance of the ensemble member models, and changes in environmental conditions. Longer training periods, on the other hand, reduce the statistical variability in the estimation. Another important decision is the choice of the geographical composition of the training set. We distinguish two different methods for doing this, to which we refer to as the Local EMOS and the Regional EMOS technique, respectively. The Regional EMOS technique uses training data from all 107 stations to estimate a single set of parameters across the Pacific Northwest, which is then used to create EMOS forecasts at all stations. Nott, Dunsmuir, Kohn, and Woodcock (2001) and Gneiting, Stanberry, Grimit, Held, and Johnson (2008) noted that localized statistical post-processing can address locally varying biases and dispersion errors in NWP models and ensemble systems. The Local EMOS technique thus restricts the training set to the station at hand, and obtains a separate set of parameter estimates at each station. For the Regional EMOS method, we use a rolling training period that consists of the m = 20 most recent available days. This is a subjective choice, and turns out not to be critical, since the method is highly robust against changes in the length of the training period. In experiments with training periods of m = 10, 15,..., 50 days the out-of-sample domain-wide mean absolute error (MAE) and mean continuous ranked probability score (CRPS) in 2003 differed by less than 0.5%. For the Local EMOS technique, we follow recommendations in the extant literature (Wilson, Beauregard, Raftery, and Verret 2007) and use a rolling training period of length m = 40 days. Not unexpectedly, the choice of m matters more than for the Regional EMOS method, because the training set contains one case per day only. Figure 4 shows the out-of-sample MAE and CRPS for the 29 stations in the US state of Washington for training periods of 8

9 (a) MAE (b) CRPS Relative MAE Relative CRPS Days in Training Set Days in Training Set Figure 4: Out-of-sample performance measures as a function of the length of the rolling training period for Local EMOS forecasts at the 29 SAO stations in Washington State, for a test period ranging from March through December (a) Relative mean absolute error (MAE) of the EMOS median forecast. (b) Relative mean continuous ranked probability score (CRPS). For each station, values are normalized in terms of the ratio of the value at hand and the mean over the candidate periods. The station-specific best choice is indicated by a black dot. m = 18, 20,..., 80 days. To facilitate interpretation, the values are normalized station by station, as the ratio of the value at hand and the respective mean over the candidate periods. While there is considerable variability in the empirically optimal value of m, the ratio mostly remains between 0.95 and Training periods of 35 to 70 days generally seem adequate. 3 Results In this case study, we use the eight-member University of Washington Mesoscale Ensemble (UWME) (Eckel and Mass 2005) to create daily 48-hour ahead probabilistic forecasts of surface wind speed at 107 meteorological stations in the Pacific Northwest in As noted above, the Regional EMOS technique employs a rolling training period consisting of the 20 most recent available days. The Local EMOS method uses a 40 day rolling training period. For calendar year 2003, data are available for 291 days, for a total of 29,542 individual forecast cases, as described in Section

10 (a) Max Coef (b) Intercept (c) Var Parameter (d) Var Parameter Member A C D Daily Index Daily Index Daily Index Daily Index Figure 5: Regional EMOS parameter estimates for the predictive model (1) over the Pacific Northwest. The daily index ranges over the 290 days for which UWME forecasts are available in (a) Fitted intercept a. (b) The panel indicates the UWME member with the largest coefficient b i, with identifications as in Table 1. (c) Fitted variance parameter c. (d) Fitted variance parameter d. 3.1 Fitted predictive model The Regional EMOS technique uses data from all 107 stations to estimate a single set of parameters across the Pacific Northwest from a 20 day rolling training period. Figure 5 shows how the parameter estimates for the predictive model (1) evolve over calendar year The fitted intercept, a, ranges from 3 to 3 knots with the higher values occurring in the warm season. Most of the time, the first or the eighth ensemble member receives the largest regression coefficient b i. As Table 1 shows, these members are driven by initial conditions supplied by the Aviation Model (AVN), run by the US National Centers for Environmental Prediction (NCEP), and by the Unified Model run by the UK Met Office (UKMO), which are generally considered the best sources. The fitted variance parameter c ranges from 10 to 30 with the higher values occurring in the cold season, and the fitted variance parameter d is mostly positive. The Local EMOS technique restricts the training set to the station at hand, and obtains a separate set of parameter estimates for each station. Figure 6 shows the fit at the station in The Dalles, Oregon, one of the windiest places in the Pacific Northwest. The estimates are generally less stable than the Regional EMOS estimates, which is unsurprising in view of the diminished, local training set. The variance parameter d is frequently estimated very near zero. One such instance is shown in Table 1, which gives details for the Local EMOS forecast at The Dalles, Oregon, valid June 14, The eight UWME ensemble member forecasts range from to The Local EMOS method fits an intercept of 3.65 and assigns the highest coefficients to the CMCG, ETA and GASP members. It corrects for the low bias in the numerical model and adjusts the ensemble spread, to give a realistic estimate of the forecast uncertainty. The Local EMOS median forecast is knots, and the verifying observation is 18 knots. A graphical illustration is given in Figure 2. 10

11 (a) Max Coef (b) Intercept (c) Var Parameter (d) Var Parameter Member A C D Daily Index Daily Index Daily Index Daily Index Figure 6: Local EMOS parameter estimates for the predictive model (1) at The Dalles, Oregon. The daily index ranges over the 274 days for which UWME forecasts are available in 2003 at this site. (a) Fitted intercept a. (b) The panel indicates the UWME member with the largest coefficient b i, with identifications as in Table 1. (c) Fitted variance parameter c. (d) Fitted variance parameter d. Table 1: Local EMOS forecast at The Dalles, Oregon, valid June 14, The first row shows the parameter estimates for the predictive model (1). The second row shows the UWME member forecasts in knots. Individual members are identified by the acronyms used by Eckel and Mass (2005). The EMOS median forecast is knots, and the verifying wind speed is 18 knots. See Figure 2 for a graphical illustration. Parameter a b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 c d Source AVN CMCG ETA GASP JMA NGPS TCWB UKMO Estimate Forecast Predictive performance over the Pacific Northwest We now assess the out-of-sample predictive performance of the EMOS technique relative to the unprocessed ensemble forecast, as well as persistence and climatological reference forecasts. The persistence forecast is a naive point forecast: it predicts future wind speeds by the most recent available observed wind speed at the given site. The regional climatological forecast is the predictive distribution obtained by the empirical distribution of the wind observations when aggregated over the Pacific Northwest and the calendar year In the local climatological forecast, a site-specific predictive distribution is created for each site based on the observations at that site from the calendar year Strictly speaking, this is not a forecast, because both past and future observations are used. However, it provides an often used reference standard. As noted, the goal in probabilistic forecasting is to maximize the sharpness of the predictive distributions subject to calibration (Gneiting, Balabdaoui, and Raftery 2007). Calibration 11

12 Table 2: Mean continuous ranked probability score (CRPS) and mean absolute error (MAE), and coverage and average width of the 77.8% central prediction interval for probabilistic forecasts of wind speed over the Pacific Northwest in Coverage in percent, all other values in knots. The MAE refers to the point forecast given by the median of the respective predictive distribution. Forecast CRPS MAE Coverage Width Persistence 4.49 Regional Climatology Local Climatology UWME Regional EMOS Local EMOS refers to the statistical consistency between the predictive distributions and the observations. Sharpness refers to the concentration of the predictive distributions: the more concentrated, the sharper, and the sharper, the better, subject to calibration. In addition to calibration and sharpness checks, we report summary measures of predictive performance, such as the mean continuous ranked probability score (CRPS) and the mean absolute error (MAE). For a probabilistic forecast, we report the MAE for the optimal point forecast under the linear loss function, namely the median of the respective predictive distribution (Gneiting 2008b). Table 2 shows the CRPS and MAE for the various types of forecasts. All results are spatially and temporally aggregated, over the Pacific Northwest and calendar year Both the Regional EMOS and the Local EMOS technique show better predictive performance than the unprocessed UWME forecast, persistence, or the climatology forecasts. The Local EMOS technique clearly performs the best. To assess calibration, we consider Figure 7, which shows the verification rank histogram for the UWME forecast and probability integral transform (PIT) histograms for the EMOS density forecasts. The verification rank histogram plots the rank of each observed wind speed relative to the eight ensemble member forecasts (Anderson 1996; Hamill and Colucci 1997; Talagrand, Vautard, and Strauss 1997). If the ensemble members and the observation are exchangeable all possible ranks are equally likely, and the histogram is uniform. The PIT is the value that the predictive cumulative distribution function attains at the observation and is a continuous analog of the verification rank (Dawid 1984; Diebold et al. 1998; Gneiting et al. 2007). Again, deviations from uniformity indicate a lack of calibration. The unprocessed ensemble forecast (UWME) is under-dispersed, in that too many observations fall outside the ensemble range, which forms a nominal 7 or 77.8% central prediction 9 interval. Indeed, under exchangeability there is a 1 chance for the observed wind speed 9 to fall below the range of the eight-member ensemble, and a 1 chance to fall above. The 9 PIT histograms for the post-processed EMOS techniques show substantial improvement in 12

13 (a) UWME (b) Regional EMOS (c) Local EMOS Relative Frequency Verification Rank Density PIT Density PIT Figure 7: Calibration checks for probabilistic forecasts of wind speed over the Pacific Northwest in (a) Verification rank histogram for the UWME forecast. (b) PIT histogram for the Regional EMOS technique. (b) PIT histogram for the Local EMOS technique. calibration. To allow for direct comparisons, Table 2 shows the empirical coverage and the average width of 7 or 77.8% central prediction intervals from the various types of probabilistic forecasts. The results for the empirical coverage echo what we see in the histograms, 9 in that the unprocessed ensemble forecast is highly uncalibrated. The EMOS intervals are much better calibrated, even though the Local EMOS forecasts are, on average, slightly under-dispersed. To quantify sharpness, Table 2 shows the average width of the 77.8% central prediction interval for the various methods. The unprocessed ensemble forecast (UWME) is very sharp, but at the expense of being uncalibrated. Local EMOS returns sharper predictive distributions than Regional EMOS, and both EMOS methods are sharper than the climatological reference forecasts. 3.3 Predictive performance at individual stations We now turn to results at the 107 SAO stations individually. Figures 8 and 9 compare the Local EMOS forecast to the unprocessed ensemble forecast (UWME) and the Regional EMOS forecast in terms of station-specific CRPS and MAE values. The color of the points in the scatter-plots indicates the site-specific average observed wind speed in calendar year The lower tercile of the stations is shown in blue, the middle tercile in green, and the upper tercile in red. Several patterns emerge. Forecasts at stations with higher mean wind speed generally are more difficult. Mostly, the Local EMOS forecast performs the best and shows the lowest CRPS and MAE values. The Regional EMOS technique shows the lowest CRPS at 31 sites, that is, roughly one in three stations, roughly corresponding to the middle tercile of the mean wind speed. For the MAE, this number is slightly less. In the cases in which the Regional EMOS technique outperforms the Local EMOS method, the improvement is marginal. 13

14 Table 3: Mean continuous ranked probability score (CRPS) and mean absolute error (MAE), and coverage and average width of the 77.8% central prediction interval for probabilistic forecasts of wind speed at The Dalles, Oregon in Coverage in percent, all other values in knots. The MAE refers to the point forecast given by the median of the respective predictive distribution. Forecast CRPS MAE Coverage Width Persistence 6.17 Regional Climatology Local Climatology UWME Regional EMOS Local EMOS Conversely, if the Local EMOS method outperforms the Regional EMOS technique, the improvement can be substantial. Hence, for a typical station in the middle tercile, the predictive performance can be improved slightly by including off-site stations in the training set. For an atypical station, however, the inclusion of training data from other stations may lead to bias and dispersion corrections that are not representative of the local climate. In light of this, we prefer the Local EMOS method if predictions at a specific location are sought, such as at a wind energy or wind surfing site. That said, it is possible that even for atypical stations the addition of carefully selected off-site data to the training set turns out to be beneficial. Potentially, climatological and geographic information can serve to cluster observation sites, to provide guidance in the spatial composition of the training set. Such a method could also be used to post-process NWP ensemble forecasts directly on the model grid, similarly to the bias removal technique developed by Mass, Baars, Wedam, Grimit, and Steed (2008). Finally, we consider results at the city of The Dalles, Oregon, which is located at the eastern terminus of the Columbia River Gorge on the border between the US states of Washington and Oregon. The winds at The Dalles are generally dictated by the channeling effects of the Columbia River Gorge, the sole near-sea-level passage through the Cascade Mountains (Mass 2008). This is one of the windiest places in the Pacific Northwest. There are several wind farms nearby, and the city has a reputation for being the best place to learn wind surfing. Summary measures of the predictive performance for the various forecast techniques at The Dalles are shown in Table 3. The Local EMOS method outperforms its competitors. Figure 10 illustrates the Local EMOS forecast distributions at The Dalles for the period of June 14 through July 31, Note that the Local EMOS density forecast for the first day in the display, June 14, is illustrated in Table 3 and Figure 2. 14

15 (a) CRPS Local EMOS UWME (b) CRPS Local EMOS Regional EMOS Figure 8: Comparison of the station-specific mean continuous ranked probability score (CRPS) for (a) the Local EMOS technique versus the unprocessed EMOS forecast and (b) the Local EMOS technique versus the Regional EMOS method. The color of the points indicates the site-specific average observed wind speed. The lower tercile of the stations is shown in blue, the middle tercile in green, and the upper tercile in red. The scores are aggregated over calendar year (a) MAE Local EMOS UWME (b) MAE Local EMOS Regional EMOS Figure 9: Same as Figure 8, but for the mean absolute error (MAE). 15

16 Local EMOS Forecast at The Dalles, Oregon Wind Speed in Knots June 14 July 1 July 15 July 31 Figure 10: 48-hour ahead Local EMOS forecasts of maximum wind speed at The Dalles, Oregon for June 14 through July 31, The Local EMOS 77.8% prediction interval is shown in gray. The small black dots represent the eight members of the University of Washington Mesoscale Ensemble; the blue points the verifying wind speeds. Missing days are due to missing data. 4 Discussion We have shown how to apply heteroskedastic censored (Tobit) regression to statistically post-process ensemble forecasts of wind speed. This is in the tradition of regression or model output statistics (MOS) approaches that yield substantial improvement in the accuracy of point forecasts from numerical weather prediction (NWP) models. In Gneiting et al. (2005) and the current paper these methods have been developed further to yield the ensemble model output statistics (EMOS) technique, which generates full predictive distributions for future weather quantities, rather than just a bias-corrected point forecast. In experiments with the University of Washington Mesoscale Ensemble (UWME) (Eckel and Mass 2005), we applied the EMOS technique to create 48-hour ahead forecasts of surface wind speed over the North American Pacific Northwest. The EMOS density forecasts turn out to be calibrated and sharp. They correct for biases and are much better calibrated than the raw ensemble, which is under-dispersive. The Local EMOS forecast distributions are sharp, in that the prediction intervals are much shorter on average than prediction intervals based on climatology. Furthermore, the median of the Local EMOS density provides a point forecast with much lower MAE than the ensemble median, or other reference forecasts. The UWME member forecasts come from clearly distinguishable sources. Other ensemble systems, such as the global ensembles run by the European Centre for Medium-Range Weather Forecasts and the US National Centers for Environmental Prediction have members 16

17 that differ only in some random perturbations (Buizza et al. 2005). In these cases, members with identical statistical properties ought to be treated as exchangeable, and thus ought to have equal EMOS coefficients. This can be enforced easily, by constraining the regression coefficients b i = βi 2 in (1) and (2) to be equal. Two general approaches to the statistical post-processing of NWP ensembles have emerged recently (Wilks 2006a; Bröcker and Smith 2008). The ensemble model output statistics (EMOS) approach pursued here fits a single, parametric predictive distribution using summary statistics from the ensemble. Another approach is based on kernel dressing or Bayesian model averaging (BMA), where each individual ensemble member is associated with a kernel function (Raftery et al. 2005; Sloughter et al. 2007). Pinson and Madsen (2008) use Gaussian kernels in a wind energy application. In Sloughter et al. (2008), BMA with kernel functions given by gamma densities is applied to obtain predictive distributions for wind speed, using the same data as presented here. The BMA method uses regional parameter estimation; thus it corresponds to the Regional EMOS technique and results for the two methods can be directly compared. 1 The predictive performance of the two techniques is nearly the same, as can be seen from our Table 2 and Table 1 of Sloughter et al. (2007). However, the EMOS technique is much simpler conceptually and is easier to implement. An R package tentatively named ensemblemos is under preparation. The EMOS method does not take temporal or spatial correlation into account, in contrast to the approach taken by Gneiting et al. (2006), who build a spatio-temporal statistical model for short-range forecasts of wind speed at a prediction horizon of two hours. Indeed, the modeling of spatial or temporal correlation does not appear to be justifiable in the current context, since the dynamic evolution of the atmosphere is already captured by the NWP model. The EMOS model addresses forecast biases, dispersion and phase errors, and the predictive distribution is conditional on the NWP ensemble forecast. In many applications, a wind speed forecast at a single prediction horizon and a single location is needed, such as a wind farm, an airport, or a wind surfing or sailing site. The EMOS method in its current form is tailored to this situation. If instead we are interested in future temporal and/or spatial trajectories of wind speed, the modeling of temporal and/or spatial dependencies becomes critical. Methods for probabilistic weather forecasting at multiple locations simultaneously have been developed for temperature (Gel, Raftery, and Gneiting 2004; Berrocal, Raftery, and Gneiting 2007) and precipitation (Berrocal, Raftery, and Gneiting 2008), and can possibly be adapted to wind speed. Pinson, Madsen, Nielsen, Papaefthymiou, and Klöckl (2008) study methods of probabilistic forecasting for temporal trajectories of wind resources. Presumably, these methods could be combined and applied to wind speed and wind energy, to provide decision support in a wealth of problems that are of economic, environmental and societal importance, such as air traffic control, ship rout- 1 Sloughter et al. (2008) work with two additional SAO stations, which we discard, because they have less than 40 days of training data in late 2002, and hence are unable to provide our standard Local EMOS forecast. However, we ran Regional EMOS with the two stations added, which does not lead to any changes in Table 2. A Local BMA technique has not been implemented yet. 17

18 ing, and wind power generation over a region or country. A first, exploratory step in this direction is taken by Vlasova, Pinson, Kotwa, Madsen, and Nielsen (2008). Acknowledgements We are grateful to Jeff Baars, Veronica J. Berrocal, Chris Fraley, Clifford F. Mass, Adrian E. Raftery and J. McLean Sloughter for helpful discussions and for providing code and data. This research was supported by the National Science Foundation under Awards ATM and DMS , and by the Joint Ensemble Forecasting System (JEFS) under subcontract S from the University Corporation for Atmospheric Research (UCAR). References Allcroft, D. J. and C. A. Glasbey (2003). A latent Gaussian Markov random-field model for spatiotemporal rainfall disaggregation. Applied Statistics 52, Anderson, J. L. (1996). A method for producing and evaluating probabilistic forecasts from ensemble model integrations. Journal of Climate 9, Baars, J. (2005). Observations QC summary page. Available at washington.edu/mm5rt/qc obs/qc obs stats.html. Berrocal, V. J., A. E. Raftery, and T. Gneiting (2007). Combining spatial statistical and ensemble information in probabilistic weather forecasts. Monthly Weather Review 135, Berrocal, V. J., A. E. Raftery, and T. Gneiting (2008). Probabilistic quantitative precipitation field forecasting using a two-stage spatial model. Annals of Applied Statistics, in press. Bertsekas, D. P. (1999). Nonlinear Programming (2nd ed.). Athena Scientific. Bremnes, J. B. (2004). Probabilistic wind power forecasts using local quantile regression. Wind Energy 7, Bröcker, J. and L. A. Smith (2008). From ensemble forecasts to predictive distribution functions. Tellus Ser. A 60, Brown, B. G., R. W. Katz, and A. H. Murphy (1984). Time series models to simulate and forecast wind speed and wind power. Journal of Climate and Applied Meteorology 23, Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu (2005). A comparison of the ECMWF, MSC and NCEP global ensemble prediction systems. Monthly Weather Review 133,

19 Campbell, S. D. and F. X. Diebold (2005). Weather forecasting for weather derivatives. Journal of the American Statistical Association 100, Chen, S. and S. Khan (2000). Estimating censored regression models in the presence of nonparametric multiplicative heteroskedasticity. Journal of Econometrics 98, Chib, S. (1992). Bayes inference in the Tobit censored regression model. Journal of Econometrics 51, Costa, A., A. Crespo, J. Navarro, G. Lizcano, H. Madsen, and E. Feitosa (2008). A review on the young history of the wind power short-term prediction. Renewable and Sustainable Energy Reviews 12, Dawid, A. P. (1984). Statistical theory: The prequential approach (with discussion and rejoinder). Journal of the Royal Statistical Society Ser. A 147, Diebold, F. X., T. A. Gunther, and A. S. Tay (1998). Evaluating density forecasts with applications to financial risk management. International Economic Review 39, Eckel, A. F. and C. F. Mass (2005). Aspects of effective mesoscale, short-range ensemble forecasting. Weather and Forecasting 20, Gel, Y., A. E. Raftery, and T. Gneiting (2004). Calibrated probabilistic mesoscale weather field forecasting: The geostatistical output perturbation (GOP) method (with discussion and rejoinder). Journal of the American Statistical Association 99, Genton, M. and A. Hering (2007). Blowing in the wind. Significance 4, Glahn, H. R. and D. A. Lowry (1972). The use of model output statistics (MOS) in objective weather forecasting. Journal of Applied Meterology 11, Global Wind Energy Council (2008). Global Wind 2007 Report. Available at Gneiting, T. (2008a). Editorial: Probabilistic forecasting. Journal of the Royal Statistical Society Ser. A 171, Gneiting, T. (2008b). Quantiles as optimal point predictors. Technical Report 538, Department of Statistics, University of Washington. Available at washington.edu/research/reports/. Gneiting, T., F. Balabdaoui, and A. E. Raftery (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Ser. B 69, Gneiting, T., K. Larson, K. Westrick, M. G. Genton, and E. Aldrich (2006). Calibrated probabilistic forecasting at the Stateline wind energy center: The regime-switching space-time method. Journal of the American Statistical Association 101, Gneiting, T. and A. E. Raftery (2005). Weather forecasting with ensemble methods. Science 310,

20 Gneiting, T. and A. E. Raftery (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102, Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review 133, Gneiting, T., L. I. Stanberry, E. P. Grimit, L. Held, and N. A. Johnson (2008). Assessing probabilistic forecasts of multivariate quantities, with applications to ensemble predictions of surface winds (with discussion and rejoinder). Test 17, Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society Ser. B 14, Grell, G. A., J. Dudhia, and D. R. Stauffer (1995). A description of the fifth-generation Penn State/NCAR mesoscale model (MM5). Technical Note NCAR/TN-398+STR. Available at Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson (2006). The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quarterly Journal of the Royal Meteorological Society 132, Hamill, T. M. and S. J. Colucci (1997). Verification of Eta-RSM short-range ensemble forecasts. Monthly Weather Review 125, Jun, S., R. Knutti, and D. W. Nychka (2008). Spatial analysis to quantify numerical model bias and dependence: How many climate models are there? Journal of the American Statistical Association 103, Kharin, V. V. and F. W. Zwiers (2002). Climate predictions with multimodel ensembles. Journal of Climate 15, Kretzschmar, R., P. Eckert, D. Cattani, and F. Eggimann (2004). Neural network classifiers for local wind prediction. Journal of Applied Meteorology 43, Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran (1999). Improved weather and seasonal climate forecasts from multimodel superensemble. Science 285, Leslie, D. S., R. Kohn, and D. J. Nott (2007). A general approach to heteroskedastic regression. Statistics and Computing 17, Mass, C. (2008). The Weather of the Pacific Northwest. University of Washington Press. Mass, C. F., J. Baars, G. Wedam, E. Grimit, and R. Steed (2008). Removal of systematic model bias on a model grid. Weather and Forecasting 23, Matheson, J. E. and R. L. Winkler (1976). Scoring rules for continuous probability distributions. Management Science 22, Møller, J. K., H. A. Nielsen, and H. Madsen (2008). Time-adaptive quantile regression. Computational Statistics & Data Analysis 52,

21 National Research Council (2006). Completing the Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts. The National Academies Press. Nielsen, H. A., H. Madsen, and T. S. Nielsen (2006). Using quantile regression to extend an existing wind power forecasting system with probabilistic forecasts. Wind Energy 9, Nott, D. J., W. T. M. Dunsmuir, R. Kohn, and F. Woodcock (2001). Statistical correction of a deterministic numerical weather prediction model. Journal of the American Statistical Association 96, Pal, S. (2009). On a conjectured sharpness principle for probabilistic forecasting with calibration. Biometrika, in press. Palmer, T. N. (2002). The economic value of ensemble forecasts as a tool for risk assessment: From days to decades. Quarterly Journal of the Royal Meteorological Society 128, Pinson, P., C. Chevallier, and G. N. Kariniotakis (2007). Trading wind generation with short-term probabilistic forecasts of wind power. IEEE Transactions on Power Systems 22, Pinson, P. and H. Madsen (2008). Ensemble-based probabilistic forecasting at Horns Rev. Wind Energy, in press. Pinson, P., H. Madsen, H. A. Nielsen, G. Papaefthymiou, and B. Klöckl (2008). From probabilistic forecasts to statistical scenarios of short-term wind power production. Wind Energy, in press. Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review 133, Roulston, M. S., D. T. Kaplan, J. Hardenberg, and L. A. Smith (2003). Using mediumrange weather forcasts to improve the value of wind energy production. Renewable Energy 28, Sansò, B. and L. Guenni (1999). Venezuelan rainfall data analysed by using a Bayesian space-time model. Applied Statistics 48, Shao, X., M. L. Stein, and J. Ching (2007). Statistical comparisons of methods for interpolating the output of a numerical air quality model. Journal of Statistical Planning and Inference 137, Sloughter, J. M., T. Gneiting, and A. E. Raftery (2008). Probabilistic wind speed forecasting using ensembles and Bayesian model averaging. Technical Report 544, Department of Statistics, University of Washington. Available at edu/research/reports/. 21