WAVES BIG DATA PLATFORM FOR REAL-TIME SEMANTIC STREAM MANAGEMENT

Similar documents
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Big Data Analysis: Apache Storm Perspective

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Streaming Big Data Performance Benchmark. for

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

ANALYTICS BUILT FOR INTERNET OF THINGS

Understanding traffic flow

Real-time Big Data Analytics with Storm

Rotorcraft Health Management System (RHMS)

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

WEB-8000 Series and WEBs-N4. Our newest products are open 4 innovation.

Introducing Storm 1 Core Storm concepts Topology design

Semantic Web Success Story

Ikasan ESB Reference Architecture Review

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

SQLstream 4 Product Brief. CHANGING THE ECONOMICS OF BIG DATA SQLstream 4.0 product brief

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Java/Scala Engineer Internet of Iot Competitors

Predictive Analytics with Storm, Hadoop, R on AWS

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

NOT IN KANSAS ANY MORE

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Building Data-Driven Internet of Things (IoT) Applications

How To Use Data For Abstract Reasoning

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Architectures for massive data management

Real Time Big Data Processing

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

The 5G Infrastructure Public-Private Partnership

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

TUM & Siemens Corporate Technology. The "Internet of Things" for industrial applications

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

BIG DATA ANALYTICS For REAL TIME SYSTEM

Big Data & Security. Aljosa Pasic 12/02/2015

How To Make Sense Of Data With Altilia

From Big Data to Smart Data Thomas Hahn

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Data Discovery, Analytics, and the Enterprise Data Hub

A stream computing approach towards scalable NLP

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

D5.3.2b Automatic Rigorous Testing Components

VM Instant Access & EMC Avamar Plug-In for vsphere Web Client

Smart Data THE driving force for industrial applications

Big Data Security Challenges and Recommendations

Topology Aware Analytics for Elastic Cloud Services

Cybersecurity Analytics for a Smarter Planet

COMPOSE a journey from the Internet of Things to the Internet of Services

Time-Series Databases and Machine Learning

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

How To Handle Big Data With A Data Scientist

Towards Smart and Intelligent SDN Controller

A Brief Introduction to Apache Tez

Applying 4+1 View Architecture with UML 2. White Paper

Collaborative Open Market to Place Objects at your Service

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

Reducing Configuration Complexity with Next Gen IoT Networks

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Glassfish Architecture.

Triplestore Testing in the Cloud with Clojure. Ryan Senior

Spark use case at Telefonica CBS

secure intelligence collection and assessment system Your business technologists. Powering progress

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

WHITE PAPER. Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets INTELLIGENT BUSINESS STRATEGIES

Web of Things Use Cases and Solutions at FZI

Essential Elements of an IoT Core Platform

CONDIS. IT Service Management and CMDB

FIRST Project Public Presentation

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Unified Batch & Stream Processing Platform

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

The 4 Pillars of Technosoft s Big Data Practice

Service Provider awareness Universal Apparatus

Making big data simple with Databricks

PROTOTYPE IMPLEMENTATION OF A DEMAND DRIVEN NETWORK MONITORING ARCHITECTURE

Oracle Internet of Things Cloud Service

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN ACCELERATORS AND TECHNOLOGY SECTOR A REMOTE TRACING FACILITY FOR DISTRIBUTED SYSTEMS

Page 2 of 5. Big Data = Data Literacy: HP Vertica and IIS

Device-centric Code is deployed to individual devices, mostly preprovisioned

Information Retrieval Elasticsearch

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Transcription:

BIG DATA PLATFORM FOR REAL-TIME SEMANTIC STREAM MANAGEMENT

OUTLINE What is? Why? How? Achievements Contact 2

WHAT IS? Massive Semantic Streams empowering Innovative Big Data Platform 3

What is a data stream? 4 Golab & Oszu (2003): A data stream is a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety. Massive volumes of data, items arrive at a high rate. 4

ABOUT 5 The real innovation in smart cities will come from integrating technologies. Colette Malonye, European Commission s Head of Unit, Smart Cities Smart Cities challenges rise to a new level of complexity with every year s population growth. New megacities are being created at a dizzying pace around the world. Reaching the next level in development will require new ways of thinking that include cutting edge technologies based on the Internet Of Things, Big Data ecosystem and Linked Data. Atos SE contributes to this paradigm shift by innovating and demonstrating new ways of using data to simplify and improve the management of cities through the development of a new project called. Rapidly growing cities threaten dangerously the availability of critical resources such as potable water. By conceiving a smart water management application dedicated to prevent leaks in the underground pipeline system, Atos SE aims at bringing majors water actors awareness to whole new level. project deploys an abstract level design that cover various domains where sensor networks and Linked Data are exploited such as traffic control, power consumption and health care improvement and energy optimization. Further details available at http://waves-rsp.org/ 5

ABOUT * Current Use-Case Water network management for industrial partner: Ondeo Systems. CONTEXT SOLUTION FOR MASSIVE SEMANTIC DATA STREAMS IN REAL-TIME 1 3 O pen Source P L AT F O R M Current Target Detect anomalies in real-time in sensor networks. GOALS. Various Tools Big Data and Semantic Web Technologies: data cleansing, filtering, reasoning, visualizing. 2 4 Final Objective Design a generic inferenceenabled and distributed platform for RDF stream processing. 6

WHY? Addressing an environmental issue on a global scale 7

RATIONALE 8 Big Data Big Data frameworks have been chosen for their capability to deal with high throughput Real-Time support parallel and rea-time query answering over RDF data streams SPARQL Continuous Query supports a specific query language to continuously process RDF data streams Distribution Distribution and scalability are at the heart of the architectural design to increase throughput Multi-Purpose Stream Processing Facing the new challenges of increasingly highly connected IOTs, is designed to analyze and act on real-time streaming data using continuous queries. These queries are executed in parallel in a distributed framework and support CEPbased operators over time-annotated RDF triples. Reasoning Engine over RDF Data As a stream processing platform, aims at handling high volumes of semantic data in real time with a scalable, distributed and fault tolerant architecture. This enables analysis of data in motion and anomaly detection supported by inference rules and reasoning capabilities, 8

APPLICATIONS DOMAIN Website logs Network monitoring Financial services Weather forecasting circular economy ecommerce Traffic control Power consumption 9

HOW? Combining Big Data and Semantic Web technologies 10

SYSTEM ARCHITECTURE Big Data systems architecture relies heavily on three robust components with a solid reputation within the Big Data community: Apache Storm, Kafka, and Redis. RDSZ Linked Data Principles Waves converts sensor data to semantic streaming data based on popular ontologies such as SSN and QUDT. It supports SPARQL queries simultaneously over streaming and static data. 11

SYSTEM ARCHITECTURE Distribution The distribution in is enabled by the so-called Storm topologies. In each topology, there are at least a Kafka spout, a windowing bolt, a step bolt and a query bolt. A topology is a software unit that consumes data from Kafka and executes continuous SPARQL queries. RDSZ Modularity The architecture is generic and multipurpose in order to handle several use cases. It contains pluggable modules, where the module is a self-contained unit in charge of executing some tasks. All modules depend on the core framework. 12

Technologies & Challenges RDF Data REAL-TIME Processing Distributed Environement OPENNESS & SECURITY TECHNICAL CHALLENGES Massive semantic streams On-the-fly computation Robust and secure architecture HUMAN CHALLENGES Consortium of 5 members: 1 start-up, 2 large companies and 2 academics Management for various profiles 13

ACHIEVEMENTS Exposing Realizations and Current Advancement Stage 14

01 Conception 02 - Development 03 - Testing 04 - Deployment 05 - Integration 06 Production STATUS & ROAD MAP Evolution Analysis 80% Over the recent years, project went through several steps from the first submission for financial support to the final integration and production stages. 90 % 80 % 80 % 70 % 60 % 50 % The organization of the project around a consortium of different members (i.e. 1 start-up, 2 large companies and 2 academics) allowed the team members to bring innovation and diversity for solving complex problems. 15

FULFILMENTS & REALIZATIONS 01 New API is open-source and provides a new JAVA API for developers available at http://waves-rsp.org/api/ 03 High Performance Compared to other engines, reaches a higher level of accuracy and fast processing under an important input load. 02 Friendly UI relies on a simple and effective graphical interface to allow users to configure the reasoning workflow and interact easily with the application. RDSZ Cleansing Eliminating outliers and dealing with absent values Semantization Converting sensor measures to semantic data Compression Downsizing the amount of data to reduce network overhead calculations. Querying Distributing queries over several machines for fast processing Visualizing Exposing the results of queries in a smart and attractive interface 16

Feel free to contact us! We are friendly and social 80 Quai Voltaire, 95870 Bezons, France @AtosFR AtosFR contact@waves-rsp.org 17