Sensor data management software, requirements and considerations. Don Henshaw H.J. Andrews Experimental Forest



Similar documents
Data Quality Assurance and Control Methods for Weather Observing Networks. Cindy Luttrell University of Oklahoma Oklahoma Mesonet

Quality Assurance for Hydrometric Network Data as a Basis for Integrated River Basin Management

AmeriFlux Site and Data Exploration System

A.1 Sensor Calibration Considerations

Data Management Implementation Plan

Data Management Framework for the North American Carbon Program

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

Archival of raw and analysed radar data at EISCAT and worldwide

Air Monitoring Directive Summary of Feedback and Responses for Chapter 6 (Ambient Data Quality)

Interplay. Production and Interplay Media Asset Manager. How the addition of Media Asset Management transforms Interplay.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Detailed Design Report

Cloud Services Catalog with Epsilon

CROSS PLATFORM AUTOMATIC FILE REPLICATION AND SERVER TO SERVER FILE SYNCHRONIZATION

OpenChorus: Building a Tool-Chest for Big Data Science

Cisco Network Optimization Service

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

Dream Report vs MS SQL Reporting. 10 Key Advantages for Dream Report

Quality assurance for hydrometric network data as a basis for integrated river basin management

How To Make Data Streaming A Real Time Intelligence

Tripwire Log Center NEXT GENERATION LOG AND EVENT MANAGEMENT WHITE PAPER

Monitoring Clinical Trials with a SAS Risk-Based Approach

Capacity Management PinkVERIFY

Clinical Data Management (Process and practical guide) Nguyen Thi My Huong, MD. PhD WHO/RHR/SIS

Structural Health Monitoring Tools (SHMTools)

RSA envision. Platform. Real-time Actionable Security Information, Streamlined Incident Handling, Effective Security Measures. RSA Solution Brief

Nevada NSF EPSCoR Track 1 Data Management Plan

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Big Data Executive Survey

Tripwire Log Center NEXT GENERATION LOG AND EVENT MANAGEMENT WHITE PAPER

Software Requirements Specification. Schlumberger Scheduling Assistant. for. Version 0.2. Prepared by Design Team A. Rice University COMP410/539

Work Process Management

Real-time Monitoring Platform to Improve the Drinking Water Network Efficiency

How To Use Vista Data Vision

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Selection Requirements for Business Activity Monitoring Tools

Scholar: Elaina R. Barta. NOAA Mission Goal: Climate Adaptation and Mitigation

The Big Data Integration and Analytics Revolution in Agricultural Finance, Risk, and Insurance

Maximizing return on plant assets

On the way to best practice in Data Management: Approaches of the UFZ and the LTER- Europe network (Long Term Ecosystem Research)

The Advantages of Plant-wide Historians vs. Relational Databases

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use

Securing your IT infrastructure with SOC/NOC collaboration

Databases & Data Infrastructure. Kerstin Lehnert

Integrity 10. Curriculum Guide

Contents of This Paper

4Sight Calibration Management Software

Software Assurance E-Learning

Drawbacks to Traditional Approaches When Securing Cloud Environments

REDUCING UNCERTAINTY IN SOLAR ENERGY ESTIMATES

MassTransit vs. FTP Comparison

Integrating Netezza into your existing IT landscape

IBM Rational ClearCase, Version 8.0

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

The Advantages of Enterprise Historians vs. Relational Databases

Electronic Raw Data and the Use of Electronic Laboratory Notebooks

Analytical Test Method Validation Report Template

Protecting Business Information With A SharePoint Data Governance Model. TITUS White Paper

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Flexible Web Visualization for Alert-Based Network Security Analytics

Gentran Integration Suite. Archiving and Purging. Version 4.2

STATEMENT OF WORK. NETL Cooperative Agreement DE-FC26-02NT41476

The USGS Landsat Big Data Challenge

Data Management Best Practices for Landscape Conservation Cooperatives Part 1: LCC Funded Science

Advanced File Integrity Monitoring for IT Security, Integrity and Compliance: What you need to know

Checklist for a Data Management Plan draft

UC CubeSat Main MCU Software Requirements Specification

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data

Proposal for a Vehicle Tracking System (VTS)

Performing a data mining tool evaluation

2007 to 2010 SharePoint Migration - Take Time to Reorganize

GStreamer for Engineering Video. Night Vision. Joshua M. Doe, Stephen D. Burks

through advances in risk-based

Civica Health & Social Care

Asia and Pacific Commission on Agricultural Statistics

Unified network traffic monitoring for physical and VMware environments

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

SOA REFERENCE ARCHITECTURE: WEB TIER

Guidelines on Quality Control Procedures for Data from Automatic Weather Stations

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Remote Services. Managing Open Systems with Remote Services

Hydrological information systems and Database Management issues Jean-Pierre BRICQUET

Monitoring Best Practices for

Transcription:

Sensor data management software, requirements and considerations Don Henshaw H.J. Andrews Experimental Forest

Joint NERC Environmental Sensor Network/LTER SensorNIS Workshop, October 25-27 th, 2011

COMMON THEMES FROM PARTICIPATING SITES JOINT NERC ENVIRONMENTAL SENSOR NETWORK/SENSOR NIS WORKSHOP, HUBBARD BROOK EXPERIMENTAL FOREST, NH, OCTOBER 25-27 TH, 2011 Approaches Top down (NEON, USGS) More uniform, faster implementation, less flexible Bottom-up (LTER network sites, individual sites) Less standardization, customized approaches and software Adopting solutions how do you decide? Reluctance to invest time and energy Lack of mature software Steep learning curve Do we need light-handed standardization? E.g., software, methods, controlled vocabulary, units, etc.

COMMON THEMES FROM PARTICIPATING SITES JOINT NERC ENVIRONMENTAL SENSOR NETWORK/SENSOR NIS WORKSHOP, HUBBARD BROOK EXPERIMENTAL FOREST, NH, OCTOBER 25-27 TH, 2011 Greatest Needs Middleware between sensor/data logger and database /applications Programming support Training workshops to disseminate knowledge & solutions Ways to share experiences with software and tools that are useful Clearinghouse for sharing code and solutions Knowledge Base (web page) organized by topics (http://im.lternet.edu/resources/im_practices/sensor_data) Dataloggers Campbell Scientific - most common (http://www.campbellsci.com/) Hobo (www.onsetcomp.com) Nexsens Technology (http://nexsens.com) GRAPE - NEON (http://www.neoninc.org/)

SENSOR DATA MANAGEMENT MIDDLEWARE/SOFTWARE Purpose of middleware Data storage / data handling Data aggregation, formatting, filtering Documentation Automated QA/QC on data streams Archiving Software/middleware Proprietary Campbell LoggerNet (most common) Hobo Vista Data Vision YSI EcoNet Custom applications: Matlab, Excel, SQLServer Open Source Environments (see next slide) Custom applications (Python, PHP, MySQL, etc.)

SENSOR DATA MANAGEMENT MIDDLEWARE OPENSOURCE ENVIRONMENTS FOR STREAMING DATA Matlab GCE toolbox (Proprietary/ limited open source) GUI, visualization, metadata-based analysis, manages QA/QC rules and qualifiers, tracks provenance Open Source DataTurbine Initiative Streaming data engine, receives data from various sources and sends to analysis and visualization tools, databases, etc., TiVo-like functionality Kepler Project (open source) GUI, reuse and share analytical components/workflows with other users, tracks provenance, integrates software components and data sources R-project libraries (open source) Statistical and graphical capabilities, analysis tools, code reuse and sharing, integrated environment

SENSOR MANAGEMENT SYSTEM COMPONENTS Develop protocols for installation of sensors Develop calibration / maintenance schedules System alerts (nagging system) Build and maintain sensor network metadata Data collection documentation Annotation of sensor events / Sensor history Data workflows /processing Quality Control / Flagging data Archiving Creating a citable database Versioning (monthly vs. annual ; provisional v. final) Periodic snapshots or queries Tracking changes to the data, e.g., audit trail

KEYMETADATA FOR SENSOR NETWORKS Sensor descriptions Sensor relocations or replacement (automate w/barcodes? ) Sensor events and failures Calibration events / maintenance history Data collection documentation Sensor deployment information, incl. station-level Geo-location / operational time span Sampling frequency Methodology changes, e.g., temperature radiation shield change Photo points for station, e.g., hemispheric photos, track local conditions and changes Data processing documentation System requirements / Hardware configuration / history Data processing workflow Datalogger program versions / wiring diagrams with labels Attribute / Flag definitions

SENSOR MANAGEMENT SensorML standard (http://www.opengeospatial.org/standards/sensorml) Framework for observational characteristics of sensors Covers station, deployment, sensor, parameter Tools being developed - Lacking production-grade software? CUAHSI HIS / Observation Data Model Relational database model for individual observations Provide maximum flexibility in data analysis through the ability to query and select individual observation records Record level metadata Software Tools for Sensor Networks Training, 1 May 2012

QUALITY ASSURANCE PREVENTATIVE MEASURES Sensor redundancy Ideal: Triple the sensor, triple the logger! Practical: Cheaper, lower cost, lower resolution sensors, or correlated (proxy) sensors Side Effect: establish user-confidence in data products Routine calibration and maintenance Schedule or stagger to minimize data loss Continuous monitoring and evaluating of sensor network Early detection of problems Limited budgets and increased volume of data precludes past manual sensor auditing practices Automated alerts or streaming QC

QUALITY CONTROL ON STREAMING DATA: QUALITY LEVELS Quality control is performed at multiple levels Quality level important to describe in metadata, Description differs among programs Examples: NASA, CUAHSI, Ameriflux http://ilrs.gsfc.nasa.gov/reports/ilrs_reports/9809_attach7a.html http://his.cuahsi.org/documents/odm1.pdf p.18 Level 0 (Raw streaming data) Raw data, no QC, no data qualifiers applied (data flags) Preservation of original data streams is essential Some datalogger conversion of units and formats may be acceptable (Level ½)

QUALITY CONTROL ON STREAMING DATA: QUALITY LEVELS Level 1 (QC d, calibrated data, qualifiers added) Provisional level (near real-time preparation) Typically for internal use, if released, provisional data must be labeled clearly Data qualifiers are added from initial QC Infill and flag missing datetimes Published level (delayed release) QC process is complete Data is unlikely to change data set is unique Comments: Only logger missing value codes should be deleted from streams, e.g. contention that even impossible values may have information Immediate Qc is important in near real-time, and subsequent QC as trends become apparent (e.g., sensor drift, degradation) Foremost, identify what QC has been done Identify streaming QC methods, thresholds, assumptions If no QC, that should be made clear too

QUALITY CONTROL ON STREAMING DATA: POSSIBLE QUALITY CONTROL CHECKS IN NEAR REAL-TIME Timestamp integrity (Date/time) Sequential, fixed intervals, i.e., checks for time step or frequency variation Range checks Sensor specifications - identify impossible values; not unlikely ones Seasonal/reasonable historic values Highly dependent on the sensor should be based on domain expertise Variance checks indicator of sensor degradation Running averages or change in slope checks, e.g., outlier detections, spikes Sensitivity is specific to site and sensor type Persistence checks Check for repeating values that may indicate sensor failure E.g., freezing, sensor capacity issues Internal (plausibility) checks E.g., TMAX-TMIN>0, snow depth> SWE Consistency of derived values Spatial checks Use redundant or related sensors, e.g., sensor drift

QUALITY CONTROL ON STREAMING DATA: DATA LEVELS: GAP FILLING Level 2 Gap-filled, estimated, or aggregated data Involves interpretation multiple algorithms possible different methods will lead to different products some researchers may still want to download Level 1 data to apply preferred methods Obligation to provide gap filling? Controversial can seriously compromise stats, analyses and lead to misinterpretation Desirable when generating summarized data, but transparency critical Probably unsuitable for streaming data much later in data cycle with expert attention Most critical to document gap-filling, and flag all estimated values to allow removal

QUALITY CONTROL ON STREAMING DATA: DATA QUALIFIERS Many vocabularies desirable to harmonize, but impractical (may crosswalk across vocabularies) Good approach Rich vocabulary of fine-grained flags for streaming data intended to guide local review Simpler vocabulary of flags for final data for public consumption, e.g., Verified, Accepted, Suspicious, Missing, Estimated Pass-fail indicator (include in analysis?) Certain types of qualifiers may be better as data columns Method shifts, sensor shifts Place key documentation as close to data value as possible

KNOWLEDGE BASE: A BEST PRACTICES MANUAL FOR NETWORKED SENSORS Online guide which summarizes the community s collective knowledge Organize by topics Summary of topic Populate through community crowdsourcing 1-pagers to highlight expertise, experience Cite various protocols, e.g., USGS, NOAA, NERRS, NEON, US Forest Service, CUAHSI Let a page manager be responsible for updating the summary periodically Discussion blog associated with each topic http://im.lternet.edu/resources/im_practices/sensor_data