An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

Similar documents
Internet of Things. Opportunity Challenges Solutions

Creating Big Data Applications with Spring XD

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Time series IoT data ingestion into Cassandra using Kaa

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

From Spark to Ignition:

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

The Analytics Value Chain Key to Delivering Value in IoT

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Hadoop & Spark Using Amazon EMR

Dell* In-Memory Appliance for Cloudera* Enterprise

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Real Time Big Data Processing

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Big Data and the Data Lake. February 2015

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Cisco Data Preparation

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

How To Create A Data Visualization With Apache Spark And Zeppelin

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Building Data-Driven Internet of Things (IoT) Applications

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Towards Smart and Intelligent SDN Controller

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

State-of-the-Art ENTERPRISE JAVA APPLICATIONS WITH SPRING BOOT

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Upcoming Announcements

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

CARRIOTS TECHNICAL PRESENTATION

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

How To Make Sense Of Data With Altilia

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Where is... How do I get to...

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Oracle Database 12c Plug In. Switch On. Get SMART.

Safe Harbor Statement

Fast Innovation requires Fast IT

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Firenze. Iottly, open source Internet of Things distribution

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Real Time Data Processing using Spark Streaming

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

Bringing the Power of SAS to Hadoop. White Paper

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Open Source Multi-Cloud, Multi- Tenant Automation in the cloud with SlipStream PaaS

tuplejump The data engineering platform

Shark Installation Guide Week 3 Report. Ankush Arora

Information Architecture

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Big Data Analytics - Accelerated. stream-horizon.com

Mambo Running Analytics on Enterprise Storage

Azure Data Lake Analytics

Managing large clusters resources

Big Data and Analytics: Challenges and Opportunities

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

This training is targeted at System Administrators and developers wanting to understand more about administering a WebLogic instance.

HDP Hadoop From concept to deployment.

Manifest for Big Data Pig, Hive & Jaql

XAP 10 Global HTTP Session Sharing

Integration in Action using JBoss Middleware. Ashokraj Natarajan - Cognizant

Big Data on Google Cloud

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

Platfora Big Data Analytics

Databricks. A Primer

COMP9321 Web Application Engineering

Data Challenges in Telecommunications Networks and a Big Data Solution

Oracle Big Data SQL Technical Update

Big Data Analytics Nokia

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Industrial Dr. Stefan Bungart

INTRODUCTION TO CASSANDRA

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Interactive data analytics drive insights

Qualogy M. Schildmeijer. Whitepaper Oracle Exalogic FMW Optimization

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Streamdrill: Analyzing Big Data Streams in Realtime

Open Source SOA with Service Component Architecture and Apache Tuscany. Jean-Sebastien Delfino Mario Antollini Raymond Feng

AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Wisdom from Crowds of Machines

Ali Ghodsi Head of PM and Engineering Databricks

Vortex White Paper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Transcription:

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating) Fred Melo @fredmelo_br 1 William Markito @william_markito

Traditional Data Analytics - Limitations Store Data Lake Analyti cs HDFS No real-time information ETL based Data-source specific Hard to change Labor intensive Inefficient 2

Stream-based, Real-Time Closed-Loop Analytics Data Stream Pipeline In-Memory Real- Time Data Data Lake HDFS Expert System / Machine Learning Multiple Data Sources Real-Time Processing Store Everything Continuous Learning Continuous Improvement Continuous Adapting 3

A Streaming Machine Learning for IoT Example Predictive Maintenance Scenario Sensor Data Real-Time Evaluates LIVE DATA According to historical trends, there s an 80% chance this equipment would fail in the next 12 hours" Live data becomes historical over time Smart System Historical Learns with HISTORICAL TRENDS "How were the temperature and vibration sensors reading when the latest failures happened? "

Streaming Machine Learning Info Machine Learning Analysis Look at past trends (for similar input) Evaluate current input Score / Predict 5

Streaming Machine Learning Info Filter Analysis 6 [ json ] Machine Learning

Streaming Machine Learning Info Filter Analysis 7 Enrich Machine Learning

Streaming Machine Learning Info Filter Analysis 8 Enrich Transform Machine Learning

Streaming Machine Learning Info Filter Enrich Transform ML Model Analysis 9

Streaming Machine Learning Info Filter Enrich Transform ML Model Analysis Transform 10

Streaming Machine Learning ML Model Update In-Memory Data Grid Push Front-end 11

Streaming Machine Learning Supervised Learning Example Neural Network Real-time scoring In-Memory Data Grid Train 12

A Streaming Machine Learning Reference Architecture Other Sources and Destinations Distributed Computing JMS Fast Data Ingest Transform Sink SpringXD Store / Analyze 13 Predict / Machine Learning

Indoors Localization - Applied Example 14

Trilateration and its limitations Noisy Data Physical Barriers Large Overlap Areas Moving Targets Innacuracy Large Overlap Areas 15

16 Particle Filters - Calculating the optimum solution

17 Particle Filters - Calculating the optimum solution

The Solution 1. Capture signal strength 2. Calculate distance from antenna 3. Trilaterate different sensors to predict location in real-time 4. Show on a map with live updates 18

Architecture Overview JSON HTTP Calculate Device Distance Groovy + Distance Predict Location Spring Boot Ingest Transform Sink SpringXD GUI Application Platform 19

Geode Basic Concepts Cache Configurable through XML,,Java Region Distributed j.u.map on steroids Highly available, redundant Member Locator, Server, Client Callbacks Listener, Writer, AsyncEventListener, Parallel/Serial 20

Introduction to SpringXD Runs as a distributed application or as a single node 21

Spring XD A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport. 22

Demo

Why have we selected those projects Iterative & Exploratory model Web based REPL Multiple Interpreters Apache Geode Apache Spark Markdown Flink Python Productivity Built-in connectors Cloud Agnostic Highly Scalable Easy to setup Streams without coding In-memory & Persistent Highly Consistent Extreme transaction processing Thousands of concurrent clients Reliable event model 24

Source code and detailed instructions available at: https://github.com/pivotal-open-source-hub/wifianalyticsiot Follow us on GitHub! Fred Melo @fredmelo_br 25 William Markito @william_markito 25

Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode (incubating), R and Spring XD Room: Tohotom - 14:30, Sep 30 Fred Melo, Pivotal, William Markito, Pivotal Fred Melo @fredmelo_br 26 William Markito @william_markito 26