Case Study: Real-time Analytics With Druid Salil Kalia, Tech Lead, TO THE NEW Digital
Agenda Understanding the use-case Ad workflow Our use case Experiments with technologies Redis Cassandra Introduction to Druid Architecture Druid in production Demo
Understanding the use-case
What Is Analytics? Processing the HISTORICAL data to: Understand potential trends Analyze the effects of certain decisions or events Evaluate the performance of a system Make better business decisions
What Is Real-time Analytics?
Understanding The Ad Workflow Web Page Request PUBLISHER SERVER USER Ad Request Ad-Content AD EXCHANGE AD AGENCY-2 AD AGENCY-1 AD AGENCY-3
Examples From Our Use Case How many times a video has been viewed in a particular time-span? in a particular time-span at a particular site? in a particular time-span at a particular site in a particular country? in a particular time-span at a particular site in a particular country on a particular device?
Let s play a video ad
Video Events For The Analysis LOAD START PLAYING VIEW STOP / PAUSE FINISH
Event Data (Sample) TIMESTAMP Ad Site Advertiser Event Action 2011-01-01T01:01:27 Z 2011-01-01T01:01:33 Z 2011-01-01T01:01:40 Z 2011-01-01T01:01:45 Z 2011-01-01T01:01:50 Z 2011-01-01T01:01:51 Z 123 abc.com Brand X Player Load 234 abcd.com Brand Y Player Load 123 abc.com Brand X Player Start 123 abc.com Brand X Player Playing 123 abc.com Brand Y Player Playing 123 abc.com Brand X Player Stop
Why Real-time Analytics? Understand the real-time performance Control the velocity Control the targeting Avoid over serving Avoid under serving
Recap Things We Understood Our use-case How the ad-tech works (in general) Different video player events We are expecting a huge amount of data coming at a very high velocity.
Experiments with technologies
Experience From Redis There was a huge variety of keys all over the place Not a good fit to deal with time-series (big) data Persistence is another issue we can t afford loosing data. Not a right match for our use-case
Conclusion From Redis Never blame Redis It was too early decision Our misunderstanding with the real use-case Thanks to Redis to help us understanding our requirements, very soon.
Working With Cassandra Very good support for the time-series data Extremely good for writing the data at a very high speed Very easy to scale horizontally Supports aggregations through Counters
Writing into Cassandra AD PLAYER ANALYTICS SERVER CASSANDRA
Reading from Cassandra ANALYTICS SERVER CAMPAIGN MANAGER CASSANDRA
What didn t work with Cassandra Inconsistent results Unreliable counters No ad-hoc queries support Nodes were crashing out very frequently
Crossroads What next? Third party tools on the top of Cassandra for better consistency DataStax Enterprise edition Taking a deeper dive into Cassandra to reconfigure the whole architecture and setup Switching to different technology
Understanding druid
About Druid An open-source analytics data store Supports streaming - data ingestion Flexible filters for ad-hoc queries Fast aggregations sub second queries Distributed, shared-nothing architecture Highly scalable
Setting Up Druid In Production KAFKA (CLUSTER) AD PLAYER ANALYTICS SERVER DRUID CLUSTER CASSANDRA
Druid s Reliability Check KAFKA (CLUSTER) DRUID CLUSTER AD PLAYER ANALYTICS SERVER RAW FILES Job To Test Druid s Integrity RAW FILE CONSUME R RAW FILES RAW FILES
A Quick Demo
Druid Architecture Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData
Druid Data Ingestion Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData
Druid Data Ingestion KAFKA (CLUSTER) AD PLAYER ANALYTICS SERVER DRUID Realtime Node
Druid Data Retrieval Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData
Druid Data Coordination Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData
COMPANIES USING DRUID
Questions?