Time series IoT data ingestion into Cassandra using Kaa



Similar documents
Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

How To Scale Out Of A Nosql Database

Hadoop IST 734 SS CHUNG

INTRODUCTION TO CASSANDRA

Search and Real-Time Analytics on Big Data

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Constructing a Data Lake: Hadoop and Oracle Database United!

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Cloud Storage Solution for WSN Based on Internet Innovation Union

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

LARGE-SCALE DATA STORAGE APPLICATIONS

NoSQL Databases. Nikos Parlavantzas

WSO2 Message Broker. Scalable persistent Messaging System

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Big Data with Component Based Software

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Integrating Big Data into the Computing Curricula

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Introduction to Big Data Training

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Design and Evolution of the Apache Hadoop File System(HDFS)

Information Retrieval Elasticsearch

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Cassandra vs MySQL. SQL vs NoSQL database comparison

Hadoop Architecture. Part 1

Unified Batch & Stream Processing Platform

An Approach to Implement Map Reduce with NoSQL Databases

Hadoop Distributed File System (HDFS) Overview

How To Develop An Open Play Context Framework For Android (For Android)

Using distributed technologies to analyze Big Data

So What s the Big Deal?

Comparing NoSQL Solutions In a Real-World Scenario: Aerospike, Cassandra Open Source, Cassandra DataStax, Couchbase and Redis Labs

Understanding Neo4j Scalability

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Alternatives to HIVE SQL in Hadoop File Structure

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Cloud Powered Mobile Apps with Azure

Oracle Database 12c Plug In. Switch On. Get SMART.

Time-Series Databases and Machine Learning

Bigtable is a proven design Underpins 100+ Google services:

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Xiaoming Gao Hui Li Thilina Gunarathne

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Apache Cassandra for Big Data Applications

Wisdom from Crowds of Machines

Preparing Your Data For Cloud

Chase Wu New Jersey Ins0tute of Technology

Practical Cassandra. Vitalii

XpoLog Competitive Comparison Sheet

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Apache HBase. Crazy dances on the elephant back

Simba Apache Cassandra ODBC Driver

SQL Server PDW. Artur Vieira Premier Field Engineer

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Introduction to Hbase Gkavresis Giorgos 1470

Open Source Technologies on Microsoft Azure

API MORNING. IBM Bluemix. The Digital Innovation Platform IBM Corporation

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Cloud Storage Solution for WSN in Internet Innovation Union

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

XAP 10 Global HTTP Session Sharing

Domain driven design, NoSQL and multi-model databases

HDFS. Hadoop Distributed File System

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Comparing SQL and NOSQL databases

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

CDH AND BUSINESS CONTINUITY:

Evaluation of NoSQL databases for large-scale decentralized microblogging

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor

BUILDING HIGH-AVAILABILITY SERVICES IN JAVA

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Large Scale Text Analysis Using the Map/Reduce

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Delivering secure, real-time business insights for the Industrial world

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Sentimental Analysis using Hadoop Phase 2: Week 2

Dashboard Engine for Hadoop

CitusDB Architecture for Real-Time Big Data

Workshop on Hadoop with Big Data


Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Transcription:

Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com

Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox setup Raspberry PI application code walkthrough Cassandra appender configuration Live demo Q&A

Data ingestion requirements/challenges Must have: Guaranteed data delivery Scalability Security Performance Low latency Nice to have: Built-in data structure validation Device platform independent Low footprint Low bandwidth support

Why Kaa? Fully-featured IoT middleware platform 10 Kb RAM footprint (with C SDK) Guaranteed data delivery and reliable local storage Built-in transport security Efficient data serialization Horizontally scalable and fault tolerant 100% open-source (Apache license 2.0) Rapid application development using C / C++ / Java SDKs Integration with popular device platforms

Why Cassandra? Fault tolerant Performant Horizontally scalable Easy deployment Integration with popular analytics platforms

Problem description Region 1 Region 2 Region 3 Region 4

Reference architecture Raspberry Pi Temperature sensor (DHT11) Client application Kaa SDK Kaa cluster/sandbox Kaa node... Raw data Structured Data Client application Cassandra Kaa node Raspberry Pi Temperature sensor (DHT11) Cassandra appender Kaa SDK Cassandra appender

Development environment setup Sample project repository: https://github.com/kaaproject/kaa-cassandrasample Kaa Sandbox: http://www./download-kaa Raspberry Pi: http://docs./display/kaa/raspberry+pi

Data modeling Kaa data collection schema: { "type" : "record", "name" : "SensorData", "namespace" : "org.kaaproject.kaa.sample", "fields" : [ { "name" : "sensorid", "type" : "string"}, { "name" : "model", "type" : "string"}, { "name" : "region", "type" : "string"}, { "name" : "value", "type" : "float"}] } Single sensor per row Single sensor per row, with date partitions and TTL Sensors per region and model, with date partitions

Single sensor per row Partition Key: Sensor ID Clustering Key: Timestamp Fields: Region, Model, Value, JSON, BLOB Timestamp 1 Timestamp 2 Fields Fields Sensor ID... Timestamp N Fields Query: select * from sensor_per_row where sensor_id = Sensor 1 and ts > 42

Single sensor per row, with date partitions and TTL Partition Key: Sensor ID, Date Clustering Key: Timestamp Fields: Region, Model, Value TTL: 60 sec Sensor ID, Date Timestamp N Timestamp N-1 Fields Fields... Timestamp 1 Fields Query: select * from sensor_per_date where sensor_id = Sensor 1 and date = 2015/09/10 and ts > 42

Sensors per region and model, with date partitions Partition Key: Region, Model, Date Clustering Key: Timestamp, Sensor ID Fields: Value Region, Model Date Timestamp 1, Sensor A Timestamp 1, Sensor B Value Value... Timestamp N, Sensor A Value Query: select * from sensor_per_region where region = Region A and model = DHT11 and date = 2015/09/10 17:10

THANK YOU FOR YOUR ATTENTION QUESTIONS? Andrew Shvayka ashvayka@cybervisiontech.com cybervisiontech.com

Fault-tolerance and horizontal scalability Zookeeper quorum Control servers standby active Bootstrap servers Operations servers Endpoints