Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com
Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox setup Raspberry PI application code walkthrough Cassandra appender configuration Live demo Q&A
Data ingestion requirements/challenges Must have: Guaranteed data delivery Scalability Security Performance Low latency Nice to have: Built-in data structure validation Device platform independent Low footprint Low bandwidth support
Why Kaa? Fully-featured IoT middleware platform 10 Kb RAM footprint (with C SDK) Guaranteed data delivery and reliable local storage Built-in transport security Efficient data serialization Horizontally scalable and fault tolerant 100% open-source (Apache license 2.0) Rapid application development using C / C++ / Java SDKs Integration with popular device platforms
Why Cassandra? Fault tolerant Performant Horizontally scalable Easy deployment Integration with popular analytics platforms
Problem description Region 1 Region 2 Region 3 Region 4
Reference architecture Raspberry Pi Temperature sensor (DHT11) Client application Kaa SDK Kaa cluster/sandbox Kaa node... Raw data Structured Data Client application Cassandra Kaa node Raspberry Pi Temperature sensor (DHT11) Cassandra appender Kaa SDK Cassandra appender
Development environment setup Sample project repository: https://github.com/kaaproject/kaa-cassandrasample Kaa Sandbox: http://www./download-kaa Raspberry Pi: http://docs./display/kaa/raspberry+pi
Data modeling Kaa data collection schema: { "type" : "record", "name" : "SensorData", "namespace" : "org.kaaproject.kaa.sample", "fields" : [ { "name" : "sensorid", "type" : "string"}, { "name" : "model", "type" : "string"}, { "name" : "region", "type" : "string"}, { "name" : "value", "type" : "float"}] } Single sensor per row Single sensor per row, with date partitions and TTL Sensors per region and model, with date partitions
Single sensor per row Partition Key: Sensor ID Clustering Key: Timestamp Fields: Region, Model, Value, JSON, BLOB Timestamp 1 Timestamp 2 Fields Fields Sensor ID... Timestamp N Fields Query: select * from sensor_per_row where sensor_id = Sensor 1 and ts > 42
Single sensor per row, with date partitions and TTL Partition Key: Sensor ID, Date Clustering Key: Timestamp Fields: Region, Model, Value TTL: 60 sec Sensor ID, Date Timestamp N Timestamp N-1 Fields Fields... Timestamp 1 Fields Query: select * from sensor_per_date where sensor_id = Sensor 1 and date = 2015/09/10 and ts > 42
Sensors per region and model, with date partitions Partition Key: Region, Model, Date Clustering Key: Timestamp, Sensor ID Fields: Value Region, Model Date Timestamp 1, Sensor A Timestamp 1, Sensor B Value Value... Timestamp N, Sensor A Value Query: select * from sensor_per_region where region = Region A and model = DHT11 and date = 2015/09/10 17:10
THANK YOU FOR YOUR ATTENTION QUESTIONS? Andrew Shvayka ashvayka@cybervisiontech.com cybervisiontech.com
Fault-tolerance and horizontal scalability Zookeeper quorum Control servers standby active Bootstrap servers Operations servers Endpoints