Online and Scalable Data Validation in Advanced Metering Infrastructures



Similar documents
Efficient Processing for Big Data Streams and their Context in Distributed Cyber Physical Systems

Tolerating Late Timing Faults in Real Time Stream Processing Systems. Stuart Perks

Streaming items through a cluster with Spark Streaming

Real-time distributed Complex Event Processing for Big Data scenarios

Prevention, Detection, Mitigation

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Big Data and Advanced Analytics Technologies for the Smart Grid

Application Performance Analysis and Troubleshooting

Real Time Analytics for Big Data. NtiSh Nati

Real-time Big Data Analytics with Storm

Big Data Analysis: Apache Storm Perspective

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

METIS: a Two-Tier Intrusion Detection System for Advanced Metering Infrastructures

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

MASSIF: A Highly Scalable SIEM

Architectures for massive data management

Managing Latency in IPS Networks

Big Data Analytics - Accelerated. stream-horizon.com

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Streamdrill: Analyzing Big Data Streams in Realtime

Massive Cloud Auditing using Data Mining on Hadoop

Complex Event Processing (CEP) - A Primer

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Enabling Cloud Architecture for Globally Distributed Applications

Fast Data in the Era of Big Data: Twitter s Real-

Apache Kafka Your Event Stream Processing Solution

Real Time Big Data Processing

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Apache Storm vs. Spark Streaming Two Stream Processing Platforms compared

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Net Optics Learning Center Presents The Fundamentals of Passive Monitoring Access

The Fundamentals of Intrusion Prevention System Testing

Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Middleware support for the Internet of Things

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

Streaming Big Data Performance Benchmark. for

Scalability and Design Patterns

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

OpenDaylight Performance Stress Tests Report

MEN'S FASHION UK Items are ranked in order of popularity.

Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems. Christopher Roblee Vincent Berk George Cybenko

How to Build a Massively Scalable Next-Generation Firewall

Enabling Real-Time Sharing and Synchronization over the WAN

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

Beyond Watson: The Business Implications of Big Data

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

A framework for easy development of Big Data applications

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Trading. Next Generation Monitoring. James Wylie Senior Manager, Product Marketing

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR A REMOTE TRACING FACILITY FOR DISTRIBUTED SYSTEMS

Apache S4: A Distributed Stream Computing Platform

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

How To Choose A Data Flow Pipeline From A Data Processing Platform

Security and QoS requirements in Telemedicine. Kevin Wang CSCI E-139

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Product Overview. Dream Report. OCEAN DATA SYSTEMS The Art of Industrial Intelligence. User Friendly & Programming Free Reporting.

Design Patterns for Large Scale Data Movement. Aaron Lee

MIDeA: A Multi-Parallel Intrusion Detection Architecture

How To Write A Data Processing Pipeline In R

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Frequently Asked Questions

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Rakam: Distributed Analytics API

Distributed Monitoring Pervasive Visibility & Monitoring, Selective Drill-Down

Datacenter Operating Systems

Applied Detection and Analysis Using Network Flow Data

I/O Considerations in Big Data Analytics

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Stateful Firewalls. Hank and Foo

Chapter 5: Stream Processing. Big Data Management and Analytics 193

Transcription:

Online and Scalable Data Validation in Advanced Metering Infrastructures Chalmers University of technology

Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation 4. Conclusions 2

Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation 4. Conclusions 3

Online and Scalable Data Validation in Advanced Metering Infrastructures Demand/Response, Real-time pricing, Intrusion Detection How can we validate data given that There is a large volume of continuous data demanding for distributed and parallel analysis Validation rules depend on installation-specific features such as brands, devices, protocols, System experts should define installationspecific validation rules? Noisy and Lossy data: bad-calibrated / faulty devices, lossy communication, malicious users,

Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation 4. Conclusions 5

Data Streaming - Motivation Financial applications, sensor networks monitoring, require Continuous processing of data streams Real Time fashion Store and process is not feasible Financial markets: process millions of messages/second, take fast decisions (< 100 microseconds) Data Streaming: In memory Bounded resources Efficient one-pass analysis 6

Data Streaming - System Model Data Stream: unbounded sequence of tuples Example: consumption readings Field House Device Time (secs) Consumption (kwh) Field text text int double A B 8:00 1 C D 8:20 2 A E 8:35 4 time 7

Data Streaming - System Model Operators: OP Stateless 1 input tuple 1 output tuple OP Stateful 1+ input tuple(s) 1 output tuple 8

Data Streaming - System Model Infinite sequence of tuples / bounded memory windows Example: 1 hour windows [8:00,9:00) time [8:20,9:20) [8:40,9:40) 9

Data Streaming - System Model Infinite sequence of tuples / bounded memory windows Example: 1 hour windows Counter: 1 Counter: 3 Counter: 2 Counter: 4 8:05 8:15 8:22 8:45 9:05 [8:00,9:00) time Output: 4 10

Data Streaming - System Model Infinite sequence of tuples / bounded memory windows Example: 1 hour windows Counter: 3 8:05 8:15 8:22 8:45 9:05 [8:20,9:20) time 11

Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation 4. Conclusions 12

Why Streaming-based data Validation? Expressive Online Parallel & Distributed

Sample Streaming-based Data Validation: Interpolate missing consumption values Expressive Online Aggregate Filter Map Field Device Time Consumption Match consecutive readings from each meter Field Device Time 1 Consumption 1 Time 2 Consumption 2 Forward if time distance exceeds a certain threshold Field Device Time 1 Consumption 1 Time 2 Consumption 2 Interpolate missing values Field Device Time Consumption

Sample Streaming-based Data Validation: Interpolate missing consumption values A F M Expressive Online Parallel & Distributed M A A F A F M F A F M A F A F M

Evaluation - Setup From a real-world AMI, data extracted from 50 meters Data covers 13 months (May 2012 June 2013) Implemented on top of Storm, a widely-used SPE (e.g., used in Twitter) Evaluated in terms of throughput (tuples/second) and processing latency (milliseconds)

Evaluation

Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation 4. Conclusions 18

Conclusions Streaming-based Data Validation Expressive / Distributed-Parallel / Online Implemented on top of the Storm SPE Evaluated with real-world AMI data