Application and results of automatic validation of sewer monitoring data



Similar documents
Application and results of automatic validation of sewer monitoring data

CSO Modelling Considering Moving Storms and Tipping Bucket Gauge Failures M. Hochedlinger 1 *, W. Sprung 2,3, H. Kainz 3 and K.

Analysis of pluvial flood damage based on data from insurance companies in the Netherlands

Sewerage Management System for Reduction of River Pollution

CORRELATIONS BETWEEN RAINFALL DATA AND INSURANCE DAMAGE DATA ON PLUVIAL FLOODING IN THE NETHERLANDS

Guidelines for on-line monitoring of wastewater and stormwater quality

A Primer on Forecasting Business Performance

Impact of rainfall and model resolution on sewer hydrodynamics

Part 2: Analysis of Relationship Between Two Variables

Guidelines on Quality Control Procedures for Data from Automatic Weather Stations

Risk and vulnerability assessment of the build environment in a dynamic changing society

Altoona Water Authority. Infrastructure Overview

Source Water Protection Practices Bulletin Managing Sanitary Sewer Overflows and Combined Sewer Overflows to Prevent Contamination of Drinking Water

Executive Summary Consent Decree

ANALYSIS OF RAINFALL AND ITS INFLOW INTO MOBILE, ALABAMA S, ESLAVA SEWER SHED SYSTEM

Data. Data. Science. Science. Drinking Water. Drinking Water

COMBINED SEWER OVERFLOW OPERATIONAL AND MAINTENANCE PLAN SUMMARY

Household customer. Wastewater flooding guidelines.

M E M O R A N D U M. Among the standard conditions contained in the NPDES permit is also a Duty to

Quality Assurance for Hydrometric Network Data as a Basis for Integrated River Basin Management

Module 7: Hydraulic Design of Sewers and Storm Water Drains. Lecture 7 : Hydraulic Design of Sewers and Storm Water Drains

City of Dallas Wastewater Collection System: TCEQ Sanitary Sewer Outreach Agreement City Council Briefing January 17, 2007

Geoprocessing Tools for Surface and Basement Flooding Analysis in SWMM

Vision-based sensors for water cycle

Decision support for urban drainage using radar data of HydroNET-SCOUT

Simple Predictive Analytics Curtis Seare

Short-Term Forecasting in Retail Energy Markets

SureSense Software Suite Overview

Havnepromenade 9, DK-9000 Aalborg, Denmark. Denmark. Sohngaardsholmsvej 57, DK-9000 Aalborg, Denmark

Kansas City s Overflow Control Program

ecmar SECTION INSTRUCTIONS: Sanitary Sewer Collection Systems

Unauthorized Discharges and Sanitary Sewer Overflows

Components of a Basement Flooding Protection Plan: Sewer System Improvements. November 2000

Data quality improvement in automated water quality measurement stations

Annex 6 BEST PRACTICE EXAMPLES FOCUSING ON SAMPLE SIZE AND RELIABILITY CALCULATIONS AND SAMPLING FOR VALIDATION/VERIFICATION. (Version 01.

New challenges of water resources management: Title the future role of CHy

Texas Commission on Environmental Quality Page 1 Chapter Design Criteria for Domestic Wastewater Systems

Type of Sewer Systems. Solomon Seyoum

MAP KEYS GLOSSARY FOR THE DRAINAGE AND WATER REPORT

Master Planning and Hydraulic Modeling

Maine Department of Environmental Protection Program Guidance On Combined Sewer Overflow Facility Plans

The new storage sewer in Graz Werner SPRUNG. Kanalbauamt Graz, Europaplatz 20, 8020 Graz, Austria

How to do hydrological data validation using regression

CHAPTER 2 HYDRAULICS OF SEWERS

DRAFT Public Outreach Document for What s an SSMP?

Climate change. Ola Haug. Norsk Regnesentral. - and its impact on building water damage. ASTIN Colloquium, Manchester July 2008

Standardized Runoff Index (SRI)

Asset Management and Condition Monitoring

A.1 Sensor Calibration Considerations

Basement Flood Risk Reduction City of Winnipeg. Charles Boulet

Understanding the Flo-Dar Flow Measuring System

{ { { Meeting Date 08/03/10. City of Largo Agenda Item 24. Leland Dicus, P.E., City Engineer

Peeling the Onion of Meter Accuracy Two Steps to Evaluating Flow Meter Data

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Keywords Stormwater, modelling, calibration, validation, uncertainty.

2013 MBA Jump Start Program. Statistics Module Part 3

Leak detection in virtual DMA combining machine learning network monitoring and model based analysis

430 Statistics and Financial Mathematics for Business

Regression step-by-step using Microsoft Excel

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

MS Access Queries for Database Quality Control for Time Series

THINK SMART! THE INTRODUCTION OF SMART GAS METERS

Water Management in Cuba: Problems, Perspectives, Challenges and the Role of the Cuban Academy of Sciences

Thames Water key Messages for London Borough of Ealing 25 th October 2005

A Systematic Approach to Reduce Infiltration and Inflow (I&I) and Sanitary Sewer Overflows (SSO) PETE GORHAM, P.E. MIKE LYNN FEBRUARY 19, 2015

Additional sources Compilation of sources:

Introduction to Regression and Data Analysis

Literature Review of Data Validation Methods

Martine Jak 1 and Matthijs Kok 2

Multiple Linear Regression

Testing for Granger causality between stock prices and economic growth

Water Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. March 2013 February 2014

A HOMEOWNERS GUIDE ON-SITE SEWAGE MANAGEMENT SYSTEMS

Geostatistics Exploratory Analysis

J.Dirksen*, **, J.A.E. Ten Veldhuis*, F.H.L.R. Clemens*, E.J. Baars** RÉSUMÉ ABSTRACT KEYWORDS

research highlight Remote Monitoring and Control of On-Site Wastewater Treatment, Recycling, and Reuse Systems

Scattergraph Principles and Practice Characterization of Sanitary Sewer and Combined Sewer Overflows

In comparison, much less modeling has been done in Homeowners

Metrological features of a beta absorption particulate air monitor operating with wireless communication system

Product Description KNMI14 Daily Grids

LANDSCAPING AQUA SPORT. Rainwater treatment using filter substrate channel. DIBt approval applied for

Introduction to time series analysis

What is a CSO / SSO? Sewer Overflows. Prevalence of CSOs in the US. Magnitude of Problem (Local)

READ THIS FIRST. Check here if you believe that fats, oils and/or grease (FOG) caused or contributed to the SSO. Date: Time: Title:

11. Analysis of Case-control Studies Logistic Regression

Impacts of large-scale solar and wind power production on the balance of the Swedish power system

Lars-Göran Gustafsson, DHI Water and Environment, Box 3287, S Växjö, Sweden

Seattle Public Utilities Mainline Sewer Pipe Maintenance/Backup Strategy June, 2005

Testing for Lack of Fit

INTRODUCTION AND BACKGROUND

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Second network-wide QUICS training event at Youth Hostel Lultzhausen, Luxembourg, 15 th 19 th. June 2015

How to Get More Value from Your Survey Data

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Additional Services Agreement. Part 3: Schedule of our requirements

Climate change and its impact on building water damage

On Correlating Performance Metrics

Pine Street CSO Abatement Phase I: Alternatives Evaluation and Preliminary Engineering

VILLAGE OF GRANVILLE SEWER BACK-UP and WATER LEAK POLICIES

Burnsville Stormwater Retrofit Study

Transcription:

Application and results of automatic validation of sewer monitoring data M. van Bijnen 1,3 * and H. Korving 2,3 1 Gemeente Utrecht, P.O. Box 8375, 3503 RJ, Utrecht, The Netherlands 2 Witteveen+Bos Consulting Engineers, P.O. Box 233, 7400 AE, Deventer, The Netherlands 3 Department of Sanitary Engineering, Delft University of Technology, P.O. Box 5048, 2600 GA, Delft, The Netherlands *Corresponding author, e-mail m.van.bijnen@utrecht.nl ABSTRACT In Utrecht, a monitoring network has been installed in order to check the reliability of the theoretical hydrodynamic sewer model and study the hydraulic performance of the sewer system. Flows, water levels, rainfall and turbidity are monitored at several locations in the system. The total monitoring network consists of 138 sensors and will be extended in the near future. Managing, analysing and presenting measured data, however, is a very extensive job due to the enormous amount of measurements that are daily registered. Therefore, an automatic validation tool has been developed for validation of large data sets of sewer measurements. The tool is based on the correlation between sensors and deviations in system behaviour. Depending on the type of instrument (water level, flow, rainfall or turbidity), one or more statistical models are used to automatically diagnose the quality of measurements ( correct, uncertain and incorrect ). In order to prevent erroneous quality labels, e.g. label incorrect due to construction works in the sewer system, frequent consultation of the management authority is needed. After evaluation, the data are suitable for planning and design purposes, such as decisions on investments and model calibration. KEYWORDS Data quality, data validation, sewer measurements, monitoring network, validation tool INTRODUCTION In order to comply with emission standards and reduce combined sewer overflows (CSOs), numerous and expensive investments are required, including building of settling tanks and sewers, enlarging pipes and reducing (paved) areas connected to the sewer system. In Utrecht (the Netherlands), an investment of approximately 40 million Euro is mainly based on the results of theoretical hydrodynamic models. In general, however, the accuracy of the theoretical model in comparison with reality is unknown. Consequently, uncertainty of model results affects the decision-making on the investments. Reliable model calculations require a high quality sewer database, estimates of dry weather flow and surface run off. The theoretical model of Utrecht comprises 19,404 nodes and 21,091 conduits, with 227 overflows and 232 pumps. The total dry weather flow consists of 2,814 m 3 /h domestic and 1,419 m 3 /h industrial wastewater. The run off area estimates 1,482 ha contributing to the combined sewer system en 124 ha to the improved separate system. The combined sewer system in Utrecht is divided in 22 sub systems and wastewater is transported Van Bijnen and Korving 1

between the districts by pumps. The system consists of approximately 650 km of sewers and 184 CSOs. In order to check the reliability of the theoretical model and understand the hydrodynamic behaviour of the system, a monitoring network has been installed in the combined sewer system. Flows, water levels, rainfall and turbidity are monitored at several locations. The total monitoring network consists of 138 sensors and will be extended in the near future. The extension includes approximately 140 water level sensors, 6 rain gauges and several flow sensors. Every day at least 55,000 measurements are stored in a database. Without data processing and validation, this results in a large, inaccessible data set with unknown quality. Therefore, validation of measured data is necessary. This not only provides information on the functioning of the measuring equipment, but also limits the large amount of measurement data in order to provide accessibility. Finally, validated data increase the reliability of model results and investments. There are several examples of automatic data validation in the field of sewer systems and wastewater treatment plants (e.g. Mourad and Bertrand-Krajewski 2002, Yoo et al. 2006). However, these applications mainly concern small research projects. This paper discusses the practical use of an automatic validation tool for validation of large data sets of sewer measurements. The procedure is presented including the results in Utrecht. It is illustrated with several examples. PROCEDURE FOR DATA VALIDATION Measurements in sewer systems need validation before they can be used for planning and maintenance. It is widely acknowledged that this partly results from the extreme conditions in sewers in terms of fouling and corrosion of instruments (see e.g. Mourad and Bertrand- Krajewski 2002, Rosen et al. 2003). Therefore, data validation and processing are vital elements in measuring campaigns. Usually, the amount of data collected in permanent measuring campaigns is very large. As a result, manual validation is very time consuming and possibly inaccurate. For validation of large data sets of sewer measurements an automatic validation tool has been developed (Ottenhoff et al. 2007). This tool has been used in several monitoring projects of sewer systems in the Netherlands. The tool is based on spatial correlation between sensors and deviations in system behaviour. Depending on the type of instrument (water level, flow, rainfall or turbidity), one or more models are used to determine whether a measurement is correct. The validation tool comprises several standard checks (such as double measurements, out of range measurements and missing values) and a more site specific control model. The subsequent validation steps are shown in Figure 1. The control model consists of relatively simple regression models that are calibrated using stepwise regression. The main advantage of this approach is that malfunctioning sensors are automatically left out. The data are also checked for monotone and sudden trends. The tool automatically diagnoses the quality of measurements ( correct, uncertain and incorrect ), if possible, by individually validating each measurement of a sensor. In order to prevent erroneous quality labels, e.g. due to construction works in the sewer system, frequent consultation of the sewer management authority is needed. After evaluation, the data can be applied for planning and design purposes. 2 Application and results of automatic validation of sewer monitoring data

By applying the validation tool, the following questions are answered: is the sensor working? are the measurements reliable? can the measurements be used for planning purposes? can the measurements be used for calibration of the hydrodynamic model? which measurements have to be investigated in more detail? In the next part of this paper the validation procedure as described in Figure 1 will be explained using practical examples. Results of the application in Utrecht are also presented. RAINFALL LEVEL raw data quick scan and pre-treatment FLOW QUALITY PARAMETER double measurements double measurements double measurements double measurements out of range out of range out of range out of range missing values missing values missing values missing values trend trend trend trend statistical model error analysis control model statistical model error analysis control model statistical model error analysis control model statistical model error analysis control model quality labels Figure 1. Flow chart of the validation procedure Quick scan and pre-treatment Prior to validation, pre-treatment can be necessary, e.g. in case of a mismatch between measuring frequency and sampling time. This requires a synchronisation of intervals between subsequent measurements to, for example, 5 minutes. Another example is filling gaps due to missing data. In order to validate the data, gaps are filled irrespective of their length. Pre-treatment of the data consists of interpolation and aggregation. Synchronisation and gap filling are based on interpolation. Level measurements are very suitable for interpolation, because of their nearly continuous character. The behaviour of turbidity in a sewer system, however, is much more irregular. Consequently, important information can be discarded due to interpolation. Furthermore, aggregation of data can be necessary because of low correlation between values at the original time scale. This often results from fast variations in process dynamics. For example the correlation between precipitation and water level is low at a sampling frequency of 5 minutes due to the time needed for runoff. Aggregation to hourly values increases correlation. A disadvantage of aggregation is that measurements cannot be Van Bijnen and Korving 3

validated individually at the original time scale. However, it enables a more reliable assessment at a larger time scale. Double measurements The next step is a check for double measurements. This means that the same date and time labels occur more than once in the data set but with a different observed value. Incorrect programming of readout software can cause such errors. Figure 2 shows an example of shifted dates between two readouts in 2005, the first one on November 11, the second on December 22. Measured values are shifted 10 minutes between the two readouts. Another cause of double measurements is a sudden switch of month and day (Figure 3). 22/12/2005 11/11/2005 Figure 2. Measured values shifted in time Figure 3. Switching day and month in date format Out of range measurements Measurements exceeding the maximum and minimum limits of the sensor or the physical range (e.g. below manhole bottom) are labelled as incorrect. Figure 4 shows an example of are out of range flow measurements at a pumping station. The maximum flow is 720 m 3 /h (60 m 3 /5 minutes). The registered values exceed the maximum value several times. Therefore, further research on site is needed. 4 Application and results of automatic validation of sewer monitoring data

Figure 4. Out of range measurements at pumping station Neutronweg Missing values Missing values represent incorrect data. In order to save storage capacity a variable sampling interval can be applied. However, incorrectly programmed sampling interval causes missing measurements. Loss of data can also result from limited storage in the sensor, problems with telemetry, loss of power supply or malfunctioning of equipment. Figure 5. Correctly programmed variable measurement interval Van Bijnen and Korving 5

In Utrecht, the sampling frequency of water level sensors at CSOs changes from 5 minutes to 1 minute when the water level exceeds the weir level. Figure 5 shows a correctly programmed sensor. The weir level of this CSO is 0,88 m+nap. NAP represents the Dutch reference level. The switching between the two different intervals causes the deviations on the right-hand side of the figure. Figure 6 shows an example of an incorrectly programmed sampling interval. The interval again depends on the water level. However, a clear relationship between water level and sampling interval is missing. Due to the enormous variation in sampling interval the measurements are less suitable for further analyses. Figure 6. Incorrectly programmed measurement interval Trends The time series are also checked for monotonous and sudden trends. Trends can be indicative of drift of the sensor as well of gradual changes in the system itself. Most classical trend tests, however cannot properly deal with signals with a large auto correlation and fast fluctuations of sewer processes. As a result, the applied trend test has to account for both aspects. A seasonal Kendall test with correction for covariance of the signal is most appropriate for detecting linear trends in sewer measurements (Hirsch and Slack 1984, Dietz and Killeen 2006). Step trends can be detected by comparing local variance with the variance of the complete time series. At locations where a possible trend is detected, the time series is split and a Wilcoxon ranksum test or an Ansari-Bradley test is used to test for significant differences in mean and variance respectively. Initially, when a trend is detected the measurements are labelled as uncertain with trend. However, when a cause for a detected trend is missing, the labels are manually changed into incorrect. Figure 7 shows the impact of reconstruction works on system performance. The sudden decrease of the water level on the locations NN01 and NN02 resulted from reconstruction works in the sewer system. Since the cause of the trend is known, the measurements are labelled as correct. However, jumps in the measurements due to auto-calibration of a sensor results in the label incorrect because the measurements are unsuitable for further application. This may result from maintenance of the sensor without a good protocol for reinstallation. 6 Application and results of automatic validation of sewer monitoring data

Figure 7. Detected step trend at two locations Statistical model In order to check the measurements with the statistical model, all sensors are clustered on the basis of an extensive correlation and regression analysis. The statistical model is site specific. Clusters consist of sensors with a large correlation. If necessary, the clustering is adjusted based on system characteristics. In Utrecht, 15 clusters have been defined. Rain gauges and turbidity sensors comprise separate clusters, whereas level and flow sensors are clustered together. An example of the behaviour of the sensors within a cluster is shown in Figure 8. Figure 8. Behaviour of all sensors in cluster 2 The statistical model is based on linear regression. These models take the form of, y = β 0 + β1 f1( x) + β 2 f 2 ( x) + Lβ n f n ( x). The response y is modelled as a linear combination of (non) linear, functions of the explanatory variables x. More information on linear regression and time series analysis can be found in, e.g., Hamilton (1994). The regression models are trained for each cluster individually and re-calibrated during each validation round using stepwise regression (Draper Van Bijnen and Korving 7

and Smith 1981). This is a systematic method enabling the addition and removal of explanatory variables from a multi-linear model on the basis of their statistical significance. At each step, the p-value of an F-statistic is calculated in order to test models with and without a potential explanatory variable. For a variable that is currently not in the model, the null hypothesis is that it has a zero coefficient if added to the model. If the null hypothesis can be rejected, the variable is added to the model. Conversely, terms that are currently in the model can be removed if there is insufficient evidence to reject the null hypothesis, i.e. a zero coefficient in the model. The method results in a regression model in which the most significant parameters are included. The main advantage of stepwise regression is that malfunctioning sensors are automatically left out. An example is shown in Figure 9. The measurements at the internal weir of the storage tank do not correspond with the measurements at the external weir. The storage tank is filled, although the water level in the sewer system remains below weir level. Figure 9. Incorrect measurements due to unexplained system behaviour Quality labels The results of the control steps are combined in an overall assessment of the measurement values. A quality label ( correct, uncertain and incorrect ) is attached to each individual measurement. If possible, the labels are made more specific (e.g. trend or out of range ) in order to save time when solving the problem. The validation results are graphically presented to the user by means of a web-based plug-in. An example of the results is shown in Figure 10. In order to prevent erroneous labelling of measurements, e.g. due to construction works in the sewer system, frequent consultation of the sewer manager is needed. 8 Application and results of automatic validation of sewer monitoring data

correct incorrect out of range uncertain not labeled missing value double measurement Figure 10. Reported quality of all water level sensors for January 2007 RESULTS The results show that data validation provides indispensable information on the performance of measurement equipment. Only measurements labelled uncertain require further investigation. This includes checking the sensor performance on site. The label uncertain is meant as a temporary quality label, i.e. each measurement labelled uncertain should be checked manually in order to change the label to correct or incorrect. The re-calibration during each validation round substantially improves the performance of the validation tool. Furthermore, aggregation of the measurements prior to labelling enables a more reliable assessment on a larger time scale. However, due to aggregation the measurements cannot be validated individually at the original time scale. In Utrecht, approximately 8 million Euro is invested in the sewer system based on validated measurements. Other measures costing over 12 million Euro will be investigated in more detail because doubts have risen regarding their effectiveness. An example a measure based on validated data is the cleaning of sewers, where an increased number of CSOs per year is used as an indicator for blockage of a siphon. This leads to more reliable cleaning schemes. Furthermore, erroneous records in the database of the sewer system have been corrected on the basis of the validated measurements. Data validation also provides an excellent basis for contracts on reliability and availability of measurements between sewer manager and data provider. CONCLUSION The objective of this paper is to describe the application of automatic data validation for large sets of sewer measurements. The tool consists of a combination of tests to determine the quality of a measured value, including logical tests (missing values, out of range values, etc.), regression models and trend detection. It is provided to the user as a web-based plug-in. Measurements and quality labels are accessible with a geographical interface. Van Bijnen and Korving 9

The application of the validation tool in Utrecht has highlighted the importance of data validation in order to improve the quality of monitoring programs in sewer systems. The following conclusions have been drawn: Sewer measurements require pre-treatment (e.g. interpolation and aggregation) to improve validation results. Each sensor type (level, rainfall, flow and turbidity) requires a specific combination of models to determine whether the measurements are correct. Re-calibration during each validation round and feedback to the sewer manager on changes in system behaviour improve the performance of the validation tool. Validated measurements provide a reliable basis for decision-making on investments in the sewer system. In the near future, the validated measurements will also be applied for calibration of the hydrodynamic model of the sewer system. REFERENCES Dietz E.J. and Killeen T.J. (2006). A non parametric multivariate test for monotone trend with pharmaceutical applications. Journal of the American Statistic Association, 76, 169-174. Draper N. and Smith H. (1981). Applied Regression Analysis. Second Edition. Wiley, New York. Hamilton J.D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey. Hirsch R.M. and Slack R.J. (1984). A nonparamteric trend test for seasonal data with serial dependence. Water Resources Research, 20(6), 727-732. Mourad M. and Bertrand-Krajewski J.-L. (2002). A method for automatic validation of long time series of data in urban hydrology. Water Science and Technology, 45(4-5), 263-270. Ottenhoff E.C., Korving H. and Clemens F.H.L.R. (2007). Automatic validation of large sets of sewer measurement data. In: Proceedings of the 3rd International IWA Conference on Automation in Water Quality Monitoring - AutMoNet2007, September 5-7, 2007, Gent, Belgium. Rosen C., Röttorp J. and Jeppsson U. (2003). Multivariate on-line monitoring: challenges and solutions for modern wastewater treatment operation. Water Science and Technology, 47(2), 171-179. Yoo C.K., Villez K., Lee I.B., Van Hulle S. and Vanrolleghem P.A. (2006). Sensor validation and reconciliation for a partial nitrification process. Water Science and Technology, 53(4-5), 513-521. 10 Application and results of automatic validation of sewer monitoring data