Mastering the Velocity Dimension of Big Data



Similar documents
Processing Flows of Information: From Data Stream to Complex Event Processing

Aurora: a new model and architecture for data stream management

Processing Flows of Information: From Data Stream to Complex Event Processing

Solving big data problems in real-time with CEP and Dashboards - patterns and tips

Complex Information Management Using a Framework Supported by ECA Rules in XML

Industry 4.0 and Big Data

Social Data Science for Intelligent Cities

Low Latency Complex Event Processing on Parallel Hardware

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

Data Stream Management System for Moving Sensor Object Data

Dynamic M2M Event Processing Complex Event Processing and OSGi on Java Embedded

The Synergy of SOA, Event-Driven Architecture (EDA), and Complex Event Processing (CEP)

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams

ECS 165A: Introduction to Database Systems

Toward Effective Big Data Analysis in Continuous Auditing. By Juan Zhang, Xiongsheng Yang, and Deniz Appelbaum

Kai Wähner. The Next-Generation BPM for a Big Data World: Intelligent Business Process Management Suites (ibpms)

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Event-based middleware services

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

Data Quality in Information Integration and Business Intelligence

Data Stream Management System

Chapter 7, System Design Architecture Organization. Construction. Software

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

A Practical Approach to Process Streaming Data using Graph Database

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

An Approach for Knowledge-Based IT Management of Air Traffic Control Systems

Are You Ready for Big Data?

A Framework for Ontology-based Context Base Management System

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Data collection architecture for Big Data

Enabling Self Organising Logistics on the Web of Things

Big Data: Opportunities and Challenges. Raja Chiky

Querying RDF Streams with C-SPARQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Winter 2009 Lecture 1 - Class Introduction

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

Programming Without a Call Stack: Event-driven Architectures

Sensor Information Representation for the Internet of Things

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

HOW TO DO A SMART DATA PROJECT

Oracle SOA Suite: The Evaluation from 10g to 11g

How To Use Semantics In A System

A collaborative platform for knowledge management

Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics

Transforming the Telecoms Business using Big Data and Analytics

TopBraid Insight for Life Sciences

Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg

Management of Human Resource Information Using Streaming Model

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Iotivity Programmer s Guide Soft Sensor Manager for Android

Oracle Data Integrator: Administration and Development

CARRIOTS TECHNICAL PRESENTATION

Smart Grid - Big Data visualized in GIS

Semantic Business Analytics in Industrial Facilities a Case Study

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

Introduction. Background

Complex Event Processing with T-REX

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Secure Semantic Web Service Using SAML

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Business-Driven Software Engineering Lecture 3 Foundations of Processes

Survey of Distributed Stream Processing for Large Stream Sources

DBMS / Business Intelligence, SQL Server

Report on the Dagstuhl Seminar Data Quality on the Web

JOURNAL OF OBJECT TECHNOLOGY

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

IT outsourcing and cloud computing: definitions, models, and emerging trends. Index. From IT outsourcing to cloud computing: a

Data Integration and Fusion using RDF

Where is... How do I get to...

Transcription:

Mastering the Velocity Dimension of Big Data Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it

It's a streaming world Agenda Mastering the velocity dimension with informaeon flow processing Hands- on on esper and the Event Processing Language Stream Reasoning: adding the variety dimension What about Veracity? 2

It's a streaming World! 1/2 Off- shore oil operaeons Smart CiEes Global Contact Center Social networks Generate data streams! 3

It's a streaming World! 2/2 What is the expected Eme to failure when that turbine's barring starts to vibrate as detected in the last 10 minutes? Is public transportaeon where the people are? Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday? Who is driving the discussion about the top 10 emerging topics? E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing InformaEon. IEEE Intelligent Systems 24(6): 83-89 (2009) 4

Requirements 1/8 A system able to answer those queries must be able to handle massive datasets A typical oil produceon plaborm is equipped with about 400.000 sensors Telecom data is the most pervasive data source in urban are, in Milano there are 1.8 million mobile users A global contact centre of a Telecom operator counts 500 millions of clients Facebook alone has 1.1 billion of aceve users 5

Requirements 2/8 A system able to answer those queries must be able to process data streams on the fly The sensors on typical oil produceon plaborm generates 10,000 observaeons per minute with peaks of 100,000 o/m The mobile users in Milano generates 20,000 call/sms/data conneceons per minute with peaks of 80,000 c/m A global contact centre receives 10,000 contacts per minute with peaks of 30,000 c/m Facebook, as of May 2013, observes 3 millions "I like" per minute 6

Requirements 3/8 A system able to answer those queries must be able to cope with heterogeneous dataset The sensors on typical oil produceon have been deployed over 10 years by 10s of different producers Tens of data sources are normally needed to make sense of an urban phenomena A global contact centre consists in 100s of offices owned by different subsidiary companies engaged yearly Each social network has its own data model, APIs, 7

Requirements 4/8 A system able to answer those queries must be able to cope with incomplete data 10s of sensors and networking links broke down daily Coverage is incomplete Only standard cases are covered by fully machine processable data records 100s of contacts per minute are manage ad- hoc ConversaKons happen outside the social networks, too! 8

Requirements 5/8 A system able to answer those queries must be able to cope with uncertain data Sensor out- of- operakng range Faulty sensors Agents misunderstand, get Kred, Irony, sarcasm, 9

Requirements 6/8 A system able to answer those queries must be able to provide reackve answers deteceon of dangerous situaeons must occur within minutes recommendaeons to ciezens must be performed in few seconds roueng a contact through each step of the decision tree must take less than a second Search autocompleeng may need to be updated every few minutes 10

Requirements 7/8 A system able to answer those queries must be able to support fine- grained informakon access IdenEfy a turbine among thousands Locate a bus among thousands Contact an agent among thousands IdenEfy an opinion maker among thousands of influencers for a topic 11

Requirements 8/8 A system able to answer those queries must be able to integrate complex domain models of operakonal and control process various city aspects contact management, contract types, agent skills, contactor profiles, topics, user profiles, 12

Requirements (wrap up) A system able to answer those queries must be able to handle massive datasets x process data streams on the fly cope with heterogeneous dataset cope with incomplete data cope with uncertain data provide reackve answers support fine- grained informakon access integrate complex domain models 13 x Volume' x x x Velocity' x x x x Variety' x Veracity'

Other domains Intrusion deteceon Fraud DetecEon Emergency Response Services TransportaEon and LogisEcs Supply Chain OpEmizaEon System monitoring Click inspeceon... 14

Mastering the Velocity dimension with InformaEon Flow Processing (IFP) solueons 15

Velocity: Say goodbye to DBMS The DBMS way query The IFP way data data reply 16 query reply

IFP - Gartner The Gartner hype cycle 17

IFP - Forrester Forrester s top 15 emerging tech to watch: Now to 2018 18

Is there a market beyond hype? Complex Event Processing Market Worth $3,322M by 2018 (2014 Report by MarketsandMarkets) Major players include: Microsom IBM Oracle SAP Tibco... 19

InformaEon Flow Processing The IFP engine processes incoming flows of informa1on according to a set of processing rules Processing is on line Sources produce the incoming informaeon flows, sinks consume the results of processing, rule managers add or remove rules InformaEon flows are composed of informa1on items Items part of the same flow are neither necessarily ordered nor of the same kind Processing involve filtering, combining, and aggregaeng flows, item by item as they enter the engine Sources Informa1on Flows 20 IFP Engine Informa1on Flows - - - - - - - - - - - - - - - Rules - - - - - - - - - - - - - - - - - - - - - - - - - Rule managers Sinks

IFP: A bit of history of two approaches Traditional DBMS Eventbased Systems Active DBMS DSMS 21 CEP

From Passive to AcEve DBMSs Standard DBMSs Purely passive: Human- ac1ve database- passive (HADP) ExecuEon happens only when asked by clients (through queries) AcEve DBMSs The reaceve behavior moves (in part) from the applicaeon to the DB layer which executes Event CondiEon AcEon (ECA) rules 22

As a DBMS extension AcEve DBMSs Rules may only refer to the internal state of the DB Closed DB applicaeons Rules may support the semanecs of the applicaeon, but external sources of events are not allowed But events may come from external sources 23

Data Stream Management Systems (DSMS) Data streams are (unbounded) sequences of Eme- varying data elements Represent: an (almost) conenuous flow of informaeon Eme with the recent informaeon being more relevant as it describes the current state of a dynamic system 24

Data Stream Management Systems (DSMS) The nature of streams requires a paradigmaec change* from persistent data one Eme semanecs to transient data conenuous * This paradigmaec change first arose in DB community in the late '90s 25

ConEnuous SemanEcs ConEnuous queries registered over streams that are observed trough windows window Dynamic System 26 input streams Registered ConEnuous Query streams of answer

Event- based systems Components collaborate by exchanging informaeon about occurrent events. In parecular: Components publish noeficaeons about the events they observe, or they subscribe to the events they are interested to be noefied about CommunicaEon is: Purely message based Asynchronous MulEcast Implicit Anonymous topic=fire* & place=* topic=* & place=1 st floor topic=fire alarm & place=* 27 fire training at 1 st floor fire alarm at 1 st fire floor training at 1 st floor fire alarm at 1 st fire floor training at 1 st floor fire alarm at 1 st floor fire alarm at 1 st floor

Complex Event Processing (CEP) CEP systems adds the ability to deploy rules that describe how composite events can be generated from primieve (or composite) ones Typical CEP rules search for sequences of events Raise C if A B Time is a key aspect in CEP 28 - - - - - - - - - - - - - - - - - - - - Rules

Transforming vs. DetecEng Systems Transforming rules Define an execueon plan combining primieve operators Each operator transforms one or more input flows into one or more output flows The execueon plan can be defined: explicitly (e.g., through graphical notaeon) Usually, deteceon is based on a implicitly (using a high level language) Omen used with homogeneous informaeon flows To take advantage of the predefined structure of input and output 29 DetecKng rules Present an explicit disenceon between a detec1on and produc1on phase Detect a situaeon of interest Produce a complex event to noefy that the situaeon has happened logical predicate that captures paferns of interest in the history of received items

DeclaraKve Transforming languages Specify the expected result rather than the desired execueon flow Usually derive from relaeonal languages ImperaKve Specify the desired execueon flow StarEng from primieve operators Can be user- defined RelaEonal algebra / SQL CQL/Stream: Select IStream(*) From F1[Rows 5], F2[Rows 10] Where F1.A = F2.A 30 Usually adopt a graphical notaeon Arvind Arasu, Shivnath Babu, Jennifer Widom: The CQL conenuous query language: semanec foundaeons and query execueon. VLDB J. 15(2): 121-142 (2006)

ImperaEve Languages Aurora (Boxes & Arrows Model) 31 Daniel J. Abadi, Donald Carney, Ugur ÇeEntemel, Mitch Cherniack, ChrisEan Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik: Aurora: a new model and architecture for data stream management. VLDB J. 12(2): 120-139 (2003)

DeclaraEve languages Specify a firing condi1on as a pa?ern Select a poreon of incoming flows through Logic operators Content / Eming constraints The ac1on part uses the selected items to produce new knowledge TESLA / T-Rex Define Fire(area: string, measuredtemp: double) From Smoke(area=$a) and last Temp(area=$a and value>45) within 5 min. from Smoke Where area=smoke.area and measuredtemp=temp.value Gianpaolo Cugola, Alessandro Margara: TESLA: a formally defined event specificaeon language. DEBS 2010: 50-61 Gianpaolo Cugola, Alessandro Margara: Complex event processing with T- REX. Journal of Systems and Somware 85(8): 1709-1728 (2012) 32

Hands- on Esper and EPL See dedicated slides 33

Adding Variety to Velocity 34

Variety: Syntax changes, semanecs not Data comes from muleple sources in muleple formats The SemanEc Web is teaching us how to represent informaeon (knowledge) in a standard way RDF, OWL, Ontologies, SPARQL,... But the world in not staec... 35

DSMS/CEP + SemanEc Web = Stream Reasoning Requirement massive datasets data streams heterogeneous dataset DSMS CEP SemanKc Web incomplete data noisy data reaceve answers fine- grained informaeon access complex domain models The Stream Reasoning Era is coming 36

IFP: A bit of history (ConEnued) Traditional DBMS Eventbased Systems Active DBMS DSMS Semantic Web Stream Reasoning 37 CEP

Research QuesEon is it possible to make sense in real Kme of mulkple, heterogeneous, gigankc and inevitably noisy and incomplete data streams in order to support the decision process of extremely large numbers of concurrent user? E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing InformaEon. IEEE Intelligent Systems 24(6): 83-89 (2009) 38

Stream Reasoning feasibility (intuieon) Many relevant reasoning methods are not able to deal with high frequency data streams However, trade- off exists between the complexity of the reasoning method and the frequency of the data stream NEXPTIME AbstracEon DL Reasoning 1 Hz Complexity PTIME AC 0 SelecEon InterpretaEon DL- Lite SemanEc Streams Raw Stream Processing Complexity vs. Dynamics Querying Re- wrieng 10 4 Hz Change Frequency Heiner Stuckenschmidt, Stefano Ceri, Emanuele Della Valle, Frank van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on SemanEc Aspects of Sensor Networks, 2010. 39

PracEcal cases 10+ deployments in Sensor Networks, and Social media analyecs BOTTARI City Data Fusion Winner of SemanKc Web Challenge 2011 Winner of IBM faculty award 2013 M. Balduini, I. Celino, D. Dell Aglio, E. Della Valle, Y. Huang, T. Lee, S.- H. Kim, V. Tresp: BOTTARI: An augmented reality mobile applicaeon to deliver personalized and locaeon- based recommendaeons by conenuous analysis of social media streams. J. Web Sem. 16: 33-41 (2012) 40

Where to learn more about stream General Info reasoning h?p://streamreasoning.org W3C Community h?p://www.w3.org/community/rsp/ Papers h?p://scholar.google.com/scholar?hl=en&q=stream +reasoning 41

What about Veracity? 42

Support for Uncertainty Many applicakons deal with imprecise (or even incorrect) data from sources E.g., sensors, RFID, The ability to associate a degree of uncertainty to informaeon items increases expressiveness To the content of items Imprecise temperature reading To the presence of an item (occurrence of an event) Spurious RFID reading 43 Two orthogonal aspects Support for uncertain input Allows rules to deal with/ reason about uncertain input data Support for uncertain output Allows rules to associate a degree of uncertainty to the output produced Uncertain rules

Where to learn more on InformaEon Flow Processing PhD course on Stream And Complex Event Processing Lecturers Emanuele Della Valle Gianpaolo Cugola Alessandro Margara More info h?p://bit.ly/phd- course- IFP 44

Thank you! Any QuesEon? Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it