Efficient Processing for Big Data Streams and their Context in Distributed Cyber Physical Systems



Similar documents
Online and Scalable Data Validation in Advanced Metering Infrastructures

Network Infrastructure Services CS848 Project

From Spark to Ignition:

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

BSC vision on Big Data and extreme scale computing

Enabling Cloud Architecture for Globally Distributed Applications

PROTOTYPE IMPLEMENTATION OF A DEMAND DRIVEN NETWORK MONITORING ARCHITECTURE

A Comparative Study of cloud and mcloud Computing

Real-Time Enterprise Management with SAP Business Suite on the SAP HANA Platform

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Towards Lightweight Logging and Replay of Embedded, Distributed Systems

The 5G Infrastructure Public-Private Partnership

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Towards Smart and Intelligent SDN Controller

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

Big Data Analysis using Distributed Actors Framework

Big Data Storage Architecture Design in Cloud Computing

From Big Data to Smart Data Thomas Hahn

CS6204 Advanced Topics in Networking

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Tracking a Soccer Game with Big Data

Bigdata : Enabling the Semantic Web at Web Scale

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

Cloud App Anatomy. Tanj Bennett Applications and Services Group Microsoft Corps. 5/15/2015 Cloud Apps

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

Enterprise Applications

Real Time Big Data Processing

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

5G Requirements from M2M / Smart Grid

Giving life to today s media distribution services

Digital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader.

Architectures for Big Data Analytics A database perspective

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

QoS for (Web) Applications Velocity EU 2011

JAVA IEEE Privacy Policy Inference of User-Uploaded Images on Content Sharing Sites Data Mining

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Hillstone Intelligent Next Generation Firewall

Enabling Real-Time Sharing and Synchronization over the WAN

Big Data Pipeline and Analytics Platform

Communication and Embedded Systems: Towards a Smart Grid. Radu Stoleru, Alex Sprintson, Narasimha Reddy, and P. R. Kumar

The Sierra Clustered Database Engine, the technology at the heart of

Real-time distributed Complex Event Processing for Big Data scenarios

How To Provide Qos Based Routing In The Internet

Deploying Big Data to the Cloud: Roadmap for Success

Wireless Sensor Networks Database: Data Management and Implementation

Real Time Analytics for Big Data. NtiSh Nati

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

Cloud Computing at Google. Architecture

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

HPC data becomes Big Data. Peter Braam

Big Data. In Mobile Networks. Technical University of Tampere Industrial Big Data Martti Tuulos, Nokia Networks.

Big Data and Advanced Analytics Technologies for the Smart Grid

Reimagining Business with SAP HANA Cloud Platform for the Internet of Things

Virtualization of the MS Exchange Server Environment

Technology Implications of an Instrumented Planet presented at IFIP WG 10.4 Workshop on Challenges and Directions in Dependability

ORACLE COHERENCE 12CR2

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

Click to edit Master title style

The IBM Cognos Platform for Enterprise Business Intelligence

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Prevention, Detection, Mitigation

Introduction to LAN/WAN. Network Layer

Big Data Analytics - Accelerated. stream-horizon.com

Internet of things (IOT) applications covering industrial domain. Dev Bhattacharya

Information Processing, Big Data, and the Cloud

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Multilevel Communication Aware Approach for Load Balancing

Cloud Computing and Robotics for Disaster Management

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

How To Improve Performance On A Single Chip Computer

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Internet Content Distribution

Transcription:

Efficient Processing for Big Data Streams and their Context in Distributed Cyber Physical Systems Department of Computer Science and Engineering Chalmers University of Technology & Gothenburg University Gothenburg Sweden 1

Prelude Assoc prof., Chalmers Un. of Technology & Gothenburg University, Sweden Center for Mathematics & Computer Science, Netherlands Max Planck Institute for Computer Science, Germany Chalmers: forskarassistent PhD (1996) University of Patras, Greece Computer Science and Engineering Distributed Computing 2

Roadmap Cyberphysical systems, big data, streams and distributed systems: how they belong together At our research team Concluding discussion 3

Examples Cyber Physical System (CPS) Adaptive Electricity Grids www.energy daily.com/images/ http://www.kapsch.net/se/

Cyberphysical systems as layered systems communication link Sensing+computing+ communicating device aka Internet of Things (IoT) Cyber system Physical system

CPS/IoT => big numbers of devices and/or big data rates => big volumes of events/data! Why this complexity? (smart) adaptive use of resources. possibilities of improvements: e.g. energy consumption, traffic bandwidth, early warnings, improving systems quality [the 4 th industrial (r)evolution, presentation S. Jeschke, 2013] 6

Info needed in near real time Is store&process (DB) a feasible option? high rate sensors, high speed networks, soc. media, financial records: up to Mmsg/sec; decisions must be taken really fast e.g., fractions of msec, even μsecs. as of today, of the available data from sensors only 0.1% is analyzed, mainly offline (i.e., afterwards, not in or close to real time) [Jonathan Ballon, Chief Strategy Officer, General Electric] Data Streaming: In memory, in network, distributed Locality, use of available resources Efficient one pass analysis & filter fig: V. Gulisano 7

Data streaming components [State of the art literature] parallelization in operators implementations: but single point bottlenecks can still persist Challenges: Throughput, Latency, Determinism, Load balancing, Fault Tolerance Distributed input sources generating streams of data (unbounded sequences of tuples, time series) fig: V. Gulisano Continuous Query ( ies) (graph of data streaming operators/tasks). Can be used to: filter / modify tuples aggregate tuples, join streams Input/output & processing can involve multiple parallel threads stateful operations computed over windows 8

Roadmap Cyberphysical systems, big data, streams and distributed systems: how they belong together At our research team Concluding discussion 9

Fine grain parallelism Parallel Data Streaming At CTH: enhanced parallelism by means of dedicated / semanticaware concurrent data objects and their efficient algorithmic finegrain synchronization implementations fig: V. Gulisano, R. Rodriguez

Examples of results with ScaleGate Latency, throughput scaling (while keeping fault tolerant and deterministic processing; aggregation, join operations) Baseline (Borealis,Streamcloud) FIFO queue Baseline Lock free FIFO ScaleGate based shifting the saturation point of the pipeline possible to process heavier streams with same computing capacity, many times faster, Mtuples/sec [CGNPT ACM SPAA2014, GNPT IEEE BigData2015] 11

Examples of use cases: Geospatial monitoring DETERMINISTIC REAL TIME ANALYTICS OF GEOSPATIAL DATA STREAMS THROUGH SCALEGATE OBJECTS http://www.chalmers.se/en/departments/cse/news/pages/debs2015.aspx BEST SOLUTION GRAND CHALLENGE AWARD: 9th ACM SIGMOD SIGSOFT International Conference on Distributed Event Based Systems 2015 Top k frequent routes, profitable cells (near real time window based streaming) > 110,000 tuples/sec throughput, < 46 msec latency [GNWPT ACM DEBS 2015] 12

Examples of use cases: Advanced Metering Infrastructure Efficient temporal spacial clustering for on line identification of critical events (even when the communication is unreliable) Sliding window time Grid based Single Linkage Clustering (G SLC) [FALP IEEE BigData2014] 13

Examples of use cases: Advanced Metering Infrastructure Efficient Data Validation on the fly: Noisy and lossy data: bad calibrated / faulty devices, lossy communication, Eg scaling to 25 Million meters/hourly readings on mainstream 6 core platform [GAP IEEE ISGT 2014] + differentially private aggregation [ongoing work] 14

Roadmap Cyberphysical systems, big data, streams and distributed systems: how they belong together At our research team Concluding discussion 15

Summarizing & Concluding DS^2: DataStreaming*DataStructures ie efficient multicore stream processing Efficient algorithmic (in memory) stream analysis Advancing SoA BigDataStreamAnalysis (context IoT/CPS; relate with Cloud/ Fog computing) important to design algorithms that communicate as little as possible efficient processing and data analysis need to be unified [J. Dongarra, D. Reed, CACM 2015] In our ongoing/near future research: Elastic parallel&distributed, in network streaming (allowing eg. embedded devices) More concurrent data structures & multicorealgos for efficient in memory stream processing Processing high rate sensory data (eg LIDAR) & other use cases in CPS&IoT 16

Thank you Contact; ptrianta@chalmers.se Co authors in work mentioned here (from left to right): M. Almgren, D. Cederman, Z. Fu, V. Gulisano, O. Landsiedel, Y. Nikolakopoulos, M.P., P. Tsigas EXCESS 17

At our research team (approx 30 pers): Cyberphysical systems research Systems Security Distribut ed systems, IoT Parallel &stream computing Demand response in energy Data Internet of Things Energy/efficient computation Cooperative vehicular systems Resource management, load shaping Microgrids demo/ testbeds Data processing: validation, monitoring, prediction Security, privacy streaming, parallel, multicore energy efficiency : estimated savings 30 70% Communication &coordination, data driven situationawareness (new postdoc SAFER) Virtual trafficlights/safer crossings Gulliver demo/testbed