More Quality Control for Weather Station Networks Vesa Hasu
Contents Introduction The quality control motivation Some basic vocabulary Quality control overview Quality control details Further quality control problems From measurement quality control to network quality control 2
Introduction The quality control research made in Dada project (1.6.2005 31.1.2007) Dada = Development of Data Fusion and Diagnostics Methods in Weather Station Networks (http://control.hut.fi/research/dada/) Dada includes also data fusion of multiple source weather measurements The sequel: Pipo (Quality and fusion of surface weather stations and dual polarization radar measurements) Collaboration between Helsinki University of Technology (TKK), Finnish Meteorological Institute (FMI) and Vaisala Dada is funded by Tekes and Vaisala 3
Quality Control Motivation Why quality control in weather station networks is just right now important? A new generation of weather stations: cheaper and more accurate results more dense measurements in spatial and temporal directions New types of forecast products all measurements used in the forecasting should be reliable with rapid quality control Number of measurements grows by decades Automated procedures needed for quick QC decisions 4
Quality Control Motivation Consider meso/synoptic scale measurement network difference: If the average station spacing is decreased by factor 4 5, the number of measurement stations increases by factor 20 If the average sampling period decreases by a factor 12, from 1 hour to 5 minutes Number of measurements can easily grow approximately 200 times larger than before 5
Quality Control Motivation In this presentation: quality control = fault detection (of meteorological measurements) From control engineering point of view, the fault diagnosis of meteorological measurement can be tedious Unknown process Time varying system (e.g. seasonal changes) Geography dependent measurements (e.g. inland/sea) The first conclusions: Adaptation needed Traditional fault detection methods do not apply 6
Some Basic Vocabulary A bit of vocabulary: some basic error types Spike Bias Noise Drift The following fault detection method can detect the noise and spike errors 7
Noise Quality Control Overview The noise fault detection in general level is Estimate a residual Compare residual to the alarm thresholds Update the residual estimation model and alarm threshold A more accurate block diagram: 8
Noise Quality Control Overview Advantages of the solution: Adaptation to the seasonal changes Applicable to different conditions due to the adaptation The diurnal changes in the measurement statistics fairly easy to cope with No need for a background field (from e.g. LAPS) Applicable for a single station or a multiple station environment Possibility to estimate a few missing measurements 9
Preprocessing The preprocessing consists of the flagging of the most obvious errors The preprocessing flagging is made based on step and consistency checks The preprocessing is essential, since large errors may bias the residual estimation of the actual fault detection 10
Modeling and Filtering Self tuning modeling of the measurement behavior Autoregressive time series model i.e. the new measurement is assumed to depend on the previous measurements The model is updated by the recursive least squares method Filtering done by a Kalman filter Kalman filter offers optimal linear estimation Covariance matrices required by Kalman filter are also estimated by recursive methods 11
Residual Computation The idea is to filter the measurement and use the filtering residual to the fault detection The model applied in Kalman filter is modeled with a self tuning recursive least squares estimation The advantage: easy to implement for different measurements and robustness against the seasonal changes 12
Measurement Flagging A measurement is flagged as erroneous, if the corresponding residual is larger than three (or four) times the standard deviation of the residual 3σ and 4σ thresholds comes from the assumption of normally distributed residual According to the theory, the 3σ threshold corresponds to passing more than 99.5 % of measurements Since the residual does not follow exactly the normal distribution, the 4σ threshold should be used in the practice 13
Filtering Example An example of temperature filtering: Suvisaaristo with added noise T ( o C) 20 15 10 Thick black = measured temperature Thin black = temperature with added noise Red = estimated temperature 0 50 100 150 200 250 5 Difference ( o C) 0 Black = the added noise Red = error of the estimated temperature -5 0 50 100 150 200 250 Iteration (5 min) 14
Another Filtering Example The difficulty of weather measurement filtering: Svartviken Svartviken 1007 1007 1006 1006 Barometric pressure (hpa) 1005 1004 1003 1002 Barometric pressure (hpa) 1005 1004 1003 1002 1001 1001 1000 1000 7100 7120 7140 7160 7180 7200 7220 7240 7260 7280 Iteration (5 min) 7100 7120 7140 7160 7180 7200 7220 7240 7260 7280 Iteration (5 min) β = 0.9 β = 0.99 Black = measured barometric pressure Purple = filtered barometric pressure 15
Residual Examples Suvisaaristo barometric pressure and its residual 1030 Suvisaaristo A Barometric pressure p (hpa) 1020 1010 1000 12.7.2006 14.7. 16.7. 18.7. Residual (hpa) 0.6 0.4 0.2 0-0.2-0.4 12.7.2006 14.7. 16.7. 18.7. 16
Residual Examples Suvisaaristo temperature and its residual 30 Suvisaaristo A Temperature 25 T ( o C) 20 15 10 5 12.7.2006 14.7. 16.7. 18.7. 2 Residual ( o C) 1 0-1 -2 12.7.2006 14.7. 16.7. 18.7. 17
Time Dependent Alarm Thresholds Since the measurement are very much environment dependent, the alarm thresholds must be adaptive The variation of some meteorological measurements may have diurnal dependency The taken approach: estimate the residual variation based on the time of day 18
Alarm Thresholds: A Practical Recursive Version The recursive update of residual variance, and also the alarm threshold, looks like ( )( ) 2 ˆ σ ( k) = λσˆ ( k) + 1 λ x( k) x( k) 2 2 ˆ Good for e.g. barometric pressure, which does not experience diurnal variance change The update rule for a measurement with diurnal variance change looks like ( )( ) 2 ˆ σ ( k) = λσˆ ( k 288) + 1 λ xk ( ) xk ( ) 2 2 ˆ Good for e.g. temperature and relative humidity, which variance depends strongly on the lighting conditions Additional temporal smoothing must be done 19
Residual Example An example of relative humidity measurement Relative Humidity [%] 95 90 85 80 75 6 12 18 0 6 12 15.11.2005 16.11.2005 2 Black = measurement Red = filtered measurement Black = 3σ alarm threshold Residual [%] 1 0-1 Red = residual -2 6 12 18 0 6 12 15.11.2005 16.11.2005 20
Residual Example An example on the temperature measurement with erroneous behavior Temperature [ o C] 22 21 20 19 18 17 2 0 6 12 18 0 15.7.2006 Residual [ o C] 1 0-1 -2 0 6 12 18 0 15.7.2006 21
Residual Example: Two Stations Temperature T ( o C) 25 20 15 10 Juhanila A Juhanila A station 0 1000 2000 3000 4000 5000 6000 7000 8000 Ennustevirhe ( o C) Residual 1 0.5 0-0.5 10-1 0 1000 2000 3000 4000 5000 6000 7000 8000 Iteraatio 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Juhanila stations August 2005 A station: 2 m level B station: 100 m level Residual Temperature T ( o C) Ennustevirhe ( o C) 30 25 20 15 10 5 0-5 Juhanila Juhanila B station B -10 0 1000 2000 3000 4000 5000 6000 7000 8000 Iteraatio 22
Further Quality Control Problems As mentioned earlier, the current fault detection works well in detecting spikes and noise Fault/weather phenomena separation is always an issue Drift detection is much more tedious task Detection seems possible using either neighboring stations or nowcasted values An exception: barometric pressure During heavy weather phenomena the measurements have special behavior that is not included in fault detection yet (e.g. wind, pressure and temperature in convection) Fault detection of non continuous measurements: rain Room for further QC work 23
From Measurement Quality Control to Measurement Network Performance For maintenance and forecasting purposes, knowing the measurement station performance is also essential Additional information for the end user about the measurement quality Maintenance operation planning In order to gain information about the network condition, descriptive performance indices about each station can be used For example: availability, accuracy, reliability, estimability, influence 24
Descriptions of Measurement Network Performance Short descriptions of performance indices : Availability = are there missing measurements Accuracy = is the measurement accurate or not Reliability = can the measurement station be trusted Estimability = can the network compensate the measurement For data users Influence = how the measurement is influencing the network performance For maintenance 25
A Performance Index Example Example temperature measurements: Measurement OK Low accuracy Low availability 26
Conclusions Automatic quality control is a necessity when increasing the mesoscale measurement network size Traditional quality control is the starting point, the newer type algorithms can improve the results Using background field is not always a feasible assumption Quality control gives an opportunity to form new metadata to maintenance decisions 27