Reprint 829 Objective Verification of Weather Forecast in Hong Kong Y.K. Leung Fourth International Verification Methods Workshop, Helsinki, Finland, 4-10 June, 2009
Objective Verification of Weather Forecast in Hong Kong LEUNG Yin-kong Hong Kong Observatory 1. Introduction The Hong Kong Observatory (HKO) is responsible for the provision of weather forecasts and warnings for the public of Hong Kong to reduce loss of life and damage to property. The accuracy of weather forecasts is one of the key indicators of the performance of HKO. In order to assess the accuracy of weather forecasts provided by HKO, an objective weather forecast verification scheme for Hong Kong was developed and computerized. This paper outlines the essential features of the scheme and the methodology employed such as categorical, spatial and user-oriented verification. It also compares the accuracy of weather forecasts issued by weather forecasters with that of persistence forecasts. HKO has commissioned since 1989 the conduct of two public opinion surveys every year (April and October) by an independent consultant to find out the subjective perception of the public on accuracy of weather forecasts issued by HKO (ACI, 2008). In this paper, a comparison is made on these subjective ratings given by the opinion surveys with the HKO objective verification scores by using time-lagged stepwise regression. 2. HKO objective verification scheme The HKO verification scheme for weather forecast was first developed in 1984 but has since undergone several major revisions (Li, 1997). The scheme was devised with a view to reflecting as closely as possible how the public would evaluate the accuracy of weather forecasts. Verification procedures in the scheme were standardized, computerized and automated. In the scheme, a score based on the accuracy of the following six weather elements, namely wind speed, 1
state of sky, precipitation, visibility, maximum temperature and minimum temperature is given for each forecast issued. To take into account the significance of individual weather elements at different time of the year, different weightings are assigned to different weather elements for different time periods according to climatology and the relative importance of the elements at that time of the year in the eyes of the public (Table 1). The final score for a forecast is obtained by summing the products of individual element scores and the corresponding weightings: 6 S = W i S i= 1 i 6 W i i= 1, = 1 where S is the final score of a forecast, S i the element score for the i th weather element and W i the corresponding weighting. Using this verification scheme, the accuracy of both the weather forecasts issued by weather forecasters and the persistence forecasts are assessed so as to measure the skill of forecasters in identifying weather changes. A persistence forecast is a forecast coded up by the computer based on the actual weather conditions for the day, assuming that the weather the next day is the same. Verification methodology for each weather element is summarized below: 2.1 Wind Wind speed is forecast in Beaufort scale. A wind forecast will be 100 marks if the actual averaged hourly mean wind speed falls within the range of the corresponding category (Table 2). Marks will be deducted according to a continuous curve (e.g. Figure 1) if the actual wind speed is outside the forecast range. Wind direction is not verified in the scheme. 2.2 State of sky The state of sky is divided into two parts, namely sunshine duration and cloud amount. A state of sky forecast will get 100 marks if the actual sunshine duration (%) and the mean cloud amount (oktas) fall within the ranges of the corresponding 2
category (Table 3). Marks will be deducted according to continuous curves (e.g. Figure 2a and 2b) if the actual sunshine duration and cloud amount are outside those ranges. The final state of sky score is given by the arithmetic mean of the scores on sunshine duration and cloud amount during periods of non-zero available sunshine, or else the score is given by the cloud amount alone. For a forecast covering two periods, i.e. one with available sunshine and the other without, the final score is given by the weighted mean of the scores of the two periods according to their respective lengths. 2.3 Precipitation Precipitation forecasts are given in categories. Four categories are defined and their respective ranges of 24-hour rainfall amount can be found in Table 4. If the forecast period with rain is less than or more than 24 hours, the rainfall figures in Table 4 would be modified in proportion to the length of the period. The average of readings from 6 automatic rain-gauges and the manual rain-gauge at HKO Headquarters (HKOHq) (Figure 3) is taken as the actual. If there is incomplete data at anyone of the rain-gauges, the gauge is excluded from the computation of average. The marking for each category against the actual rainfall is shown in Figure 4. An additional score (positive or negative) for thunderstorm forecast (Table 5) would be added to the rainfall score to give the total score for precipitation. To cater for this thunderstorm score adjustment, the final score is taken to be zero if the total score is less than zero. Similarly, the final score is taken as 100 if the total score is greater than 100. 2.4 Visibility Visibility forecast are given in categories. The visibility ranges for the categories namely fog, mist, haze and low visibility are listed in Table 6. The lowest hourly visibility reading in the three observing stations HKOHq, the Hong Kong International Airport and Waglan Island is taken for verification. Full marks will be given if the visibility reading falls within the ranges of the 3
corresponding category in Table 6 and marks will be deducted according to a continuous curve (e.g. Figure 5) if the reading falls outside those ranges. 2.5 Maximum and minimum temperatures Maximum and minimum temperature forecasts are verified against the maximum and minimum temperatures recorded at HKOHq respectively. Full marks will be given if the forecast temperature is within ±1.5 degrees of the recorded temperature. Other markings for maximum and minimum temperatures are shown in Figure 6. 3. Trend of HKO objective verification scores Figure 7 shows the time series of the monthly mean objective verification scores. It can be seen that the score was in general on the rising trend in line with the scientific advancement in weather observations, remote sensing technology, numerical weather prediction and weather forecasting skills. The score also exhibited seasonal pattern with lower scores in spring and summer months (March to September) but higher scores in autumn and winter months (October to February) (Table 7). In the scheme, higher weightings are assigned to visibility and precipitation in spring and summer months, and to minimum temperature in winter months. The score pattern is generally in line with higher forecasting skill of numerical models for synoptic scale systems than meso-scale systems. The lower scores in spring and summer months may reflect the difficulty in forecasting meso-scale systems like rainstorm and fog/mist which prevail during that time of the year. In contrast, the higher scores in winter months may reveal the better skill in capturing the variation in the surge and retreat of synoptic scale northeast monsoon which dominates the weather in winter. A comparison of the verification scores with the persistence score (Figure 8) shows that the verification score is generally 10 marks better than the persistence scores. This indicates clearly that weather forecaster has skill in predicting weather changes. 4
4. Comparison between public s subjective ratings and objective verification scores The subjective ratings given by the public at the time when the survey is conducted depends greatly on the public s past memory and impression on forecast accuracy. Lagged correlation analyses are carried out in this paper to find out the relation between objective rating in the past and the subjective rating of weather forecasts. Lagged correlations between the public s subjective ratings and the rolling average objective verification scores of preceding 1, 3, 6, 12, 24, 36 and 48 months respectively for the period 1992 2008 are determined using for example in the case of 48 months, the subjective rating in April 1996 versus the average objective verification score from April 1992 to March 1996; and the subjective rating in October 2007 versus the average objective verification score from October 2003 to September 2007. Two-tailed t-test (Draper and Smith, 1981) is applied to test the statistical significance of the correlations. To build a regression model of using the objective verification scores to predict the subjective ratings, the stepwise regression analysis (Draper and Smith, 1981) is adopted. Stepwise regression analysis is a commonly used technique for statistical model prediction in cases where there are large number of potential explanatory variables, but no underlying theory on which to base the model selection. A scatter diagram of subjective ratings against the predicted ratings is plotted with data points divided into three categories of years of roughly the same length: 1992 1996, 1997 2002 and 2003-2008 to see if there is any systematic pattern with the evolution of time. Table 8 shows that the values of lagged correlation coefficients are all close to 0.4 (0.382 to 0.444), statistically significant at 5% level. The highest coefficient of 0.444 is for 48 months. Using forward stepwise regression, the equation obtained is: y = 0.314 x 3 + 0.6 x 48 25.656 5
with multiple correlation coefficient equal to 0.51, statistically significant at 5% level. In the equation, x 3 denotes the 3-month (short-term) and x 48 the 48-month (long-term) rolling averages respectively. A scatter plot of subjective ratings against the predicted ratings is presented in Figure 9. The stepwise regression equation indicates somehow that the public s subjective ratings are related not only to short-term but also long-term objective verification scores. The predicted ratings based on objective verification scores generally have a rising tendency with time (Figure 9) indicating scientific improvement in forecasting skill. Such a memory effect points to the importance of publicity activities such as public education and outreach work (Lam, 2005). It is also interesting to note that compared with previous periods of years, the subjective ratings in recent years (2003-2008) show greater dispersion and larger deviations from the predicted ratings. This will be further studied. 5. Conclusion This paper presents the essential features of the HKO objective verification scheme for weather forecast. Methodology employed such as categorical, spatial and user-oriented verification is also described. It was found that the objective verification score was in general on the rising trend throughout the years in line with scientific achievements over the years. Objective verification score is consistently higher than the persistence score indicating weather forecasters possess skill in predicting weather changes. The public s subjective ratings of weather forecast accuracy are related somehow not only to short-term but also long-term objective verification scores. This suggests that apart from scientific work, other activities such as public education and outreach work are essential in raising public s perception on forecast accuracy. 6
References ACI (Accredited Certification International Limited), 2008: Public Opinion Survey on the Accuracy of weather Forecasts in Hong Kong - Survey Report, 84 pp, October 2008. Draper, N and H. Smith, 1981: Applied Regression Analysis, 2nd Edition, New York: John Wiley & Sons, Inc. Lam, C.Y., 2005: The Role of National Meteorological and Hydrological Services in Natural Disaster Reduction, WMO Bulletin, Volume 54, No.4. Li, S.W., 1997: Hong Kong Observatory s Objective Forecast Verification Scheme, Hong Kong Observatory Technical Note (Local) No. 70. 7
Table 1. Weightings (in %) for weather forecast elements at the beginning of each month Wind State of sky Precipitation Visibility Max temperature Min temperature January 15 15 20 0 20 30 February 10 10 20 10 20 30 March 5 5 30 30 15 15 April 5 5 40 30 10 10 May 10 10 60 0 10 10 June 20 10 60 0 5 5 July 20 15 60 0 3 2 August 20 15 60 0 3 2 September 20 20 50 0 5 5 October 20 20 30 0 15 15 November 20 20 20 0 15 25 December 15 20 20 0 15 30 Table 2. Wind category and range of wind speed Wind category Wind speed (in m/s) Moderate 0-8 Moderate to fresh 5.5-9.5 Fresh 8-11 Fresh to strong 9.5-14.0 Strong 11-17 Strong to gale 14.0-20.5 Gale 17 onwards 8
Table 3. State of sky category, ranges of sunshine duration and cloud amount Category Sunshine duration (in %) Mean cloud amount Overcast 0 7.6-8 Cloudy 0-5 6.1-7.5 Bright 5.1-10 6.1-7.5 Mainly fine 10.1-50 0-6 Fine/sunny/clear 50.1 0-6 Table 4. Precipitation category and rainfall range Category 24-hour rainfall amount (mm) No rainfall Nil Light 0 < Rainfall < = 5 Moderate 5 < Rainfall < = 25 Heavy 25 < Rainfall Table 5. Additional score for thunderstorm Forecast thunderstorm Thunderstorm reported Score Yes Yes +20 Yes No -5 No Yes -5 No No 0 9
Table 6. Visibility category and visibility range Category Fog Mist Haze Low visibility Visibility (in m) < = 1000 1000 < Visibility < 5000 Visibility < 5000 Visibility < 5000 Table 7. Monthly mean score from 1989 2008 Month Mean score January 92.6 February 92.0 March 90.4 April 89.1 May 90.5 June 89.6 July 90.0 August 89.8 September 90.5 October 93.4 November 93.6 December 93.8 Table 8. Lagged correlation coefficients between the public s subjective ratings and the rolling average objective verification score of preceding 1, 3, 6, 12, 24, 36 and 48 months Correlation coefficient Statistical significant at 5% level? 1 month 0.386 Yes 3 months 0.418 Yes 6 months 0.406 Yes 12 months 0.441 Yes 24 months 0.407 Yes 36 months 0.382 Yes 48 months 0.444 Yes 10
100 90 70 60 Marks 50 40 30 20 10 0 0 5 10 15 20 25 Wind speed (m/s) Figure 1. Marking scheme for fresh wind (solid line). Dotted line is used if the actual wind is not belonged to the fresh category when the forecast is issued. (a) (b) 100 100 90 90 Marks 70 60 50 Marks 70 60 50 40 40 30 30 20 20 10 10 0 0 5 10 15 20 25 0 0 2 4 6 8 Sunshine duration (%) Cloud amount (oktas) Figure 2. (a) Marking scheme of sunshine duration for the category of cloudy (solid line). Dotted line is used if the sunshine duration is not 0-5% at the time when the forecast is issued. (b) Marking scheme of cloud amount for the category of fine/sunny/clear (solid line). Dotted line is used if the cloud amount is not 0-6 oktas when the forecast is issued. 11
Tsim Bei Tsui Tai Po Pak Tam Au So Uk Estate Discovery Bay Hong Kong Observatory Headquarters Stanley Figure 3. Location of rain-gauges for precipitation verification. Marks Rainfall amount (mm) Figure 4. Marking scheme of precipitation for the categories: no rainfall, light, moderate and heavy (solid lines). For each category, dotted line is used if the precipitation is not belonged to the corresponding category when the forecast is issued. 12
100 90 Marks 70 60 50 40 30 20 10 0 0 2 4 6 8 10 Visibility (km) Figure 5. Marking scheme for visibility category of fog (solid line). Dotted line is used if the visibility is not equal to or below 1000 m when the forecast is issued. 100 90 Marks 70 60 50 40 30 20 10 0 0 1 2 3 4 5 Temperature deviation from actual (in C) Figure 6. Marking scheme for maximum/minimum temperature. Monthly mean score of Local Weather Forecast 100 95 90 85 75 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Score Monthly Mean Score 12-Month Running Average Figure 7. Time series of monthly mean verification score from 1989 to 2008. Year 13
Verification score 100 Persistence score 95 90 85 75 70 2000 2001 2002 2003 2004 2005 2006 2007 2008 Figure 8. Comparison between verification score and persistence score (1989-2008) 81 y = x The public s subjective rating 79 78 77 76 75 74 1992-1996 1997-2002 2003-2008 73 72 72 73 74 75 76 77 78 79 81 Predicted rating from objective verification scores Figure 9. The public s subjective rating of the accuracy of weather forecasts versus predicted rating from stepwise regression equation of objective verification scores. 14