Real- &me Analy&cs How it Started
Ba:lefield 3 Player Sta&s&cs EA Collected 50TB/day 2013. Available Player Stats sites: h?p://ba?lelog.ba?lefield.com h?p://bf3stats.com Features per gun/vehicle/class leader boards etc. Geo- leader boards introduced when Ba?lefield 4 was released November 2013. Lacks interesong analysis!
Harvested Player Data from bf3stats.com Roughly 2 million player records Each player record has 1076 fields EffecOvely a spread sheet with 2 billion cells Details: Each player record has a field country. Each player record has fields for all assault rifles: AK- 74, M416, M16, AEK- 971, F2000, FAMAS, AUG- A3, KH- 2002, AN- 94, G3A3, SCAR- L, L85A2
Ques&on For each country & assault rifle: What percent of players have each assault rifle as favorite assault rifle? Bf3stats (MongoDB): >1h BioCAM RAW: 37 milliseconds 7,00 6,00 5,00 4,00 3,00 2,00 1,00 0,00 Log10(milliseconds) 6,56 1,57 Favorite Assault Rifle bf3stats (MongoDB) BioCAM RAW
Extract from the Analysis country_name AK- 74 M416 M16 AEK- 971 F2000 FAMAS AUG A3 KH 2002 AN- 94 G3A3 SCAR- L L85A2 Sweden 12,31% 20,98% 27,32% 19,13% 7,43% 3,65% 2,26% 1,87% 1,20% 2,11% 0,39% 1,34% United States 11,19% 23,68% 25,80% 16,53% 8,05% 4,26% 2,63% 1,71% 1,45% 2,26% 0,62% 1,83% Russian FederaOon 22,95% 12,96% 22,35% 26,44% 6,09% 1,85% 1,85% 1,76% 1,57% 1,18% 0,35% 0,66% France 11,72% 17,02% 33,34% 14,88% 8,79% 6,71% 2,15% 1,79% 0,90% 1,34% 0,35% 1,01% United Kingdom 13,34% 21,40% 26,52% 16,34% 7,68% 4,03% 2,45% 1,65% 1,05% 1,72% 0,43% 3,40% Conclusion: Player have a preference for weapons used by their country s armed forces!
Conclusion Sufficient reporong speed to handle high velocity data flows Fast enough to perform analysis in real- Ome on- the- fly BioCAM Web Service
BioCAM Web Service HTTP/JSON Core BioCAM AnalyOcs Engine Duda Web Services Framework (h?p://duda.io) Monkey Web Server (h?p://monkey- project.com) HTTP(S)/JSON Web Service Interface Create mulople BioCAM instances with different schemes Arbitrarily deep break downs for various kinds of analysis Each break down serves mulople aggregates Drill- downs naovely supported from the Web Service API Duda Monkey BioCAM
RTDS (Real- Time Data Storage) NoSQL graph database to persistently store generic interconnected objects in an applicaoon Linked directly into the applicaoon to store its state Designed for telecom requirements 24/7 always low latency (no maintenance windows!), 1+1 mirroring, fast switchover and failover, upgrades in runome Side- effect: low overhead and energy efficient HTTP/JSON Duda Monkey BioCAM RTDS
Real- Time Data Storage (RTDS) HTTP/JSON Persistent NoSQL graph database Stores generic interconnected objects in an applicaoon Linked directly into the applicaoon to store its state Low overhead Energy efficient Duda Monkey BioCAM RTDS
Real- Time Data Storage cont. HTTP/JSON Designed for telecom requirements 24/7 always low latency No maintenance windows 1+1 mirroring Fast switchover and failover Upgrades in runome Duda Monkey BioCAM RTDS
RTDS Internal Workings HTTP/JSON Data is stored as a transacoon log Proven method, provides atomic transacoons, audit history and correctly ordered updates in hot standby instance Robust in crash scenarios (corrupoon in end of log only) Self- rotaong transacoon log No checkpoinong (as it introduces latency and peaks in CPU/RAM resources) Background object traversal of all objects, writes latest state to log, when complete log is rotated ~1% of CPU, no latency peaks, no resource peaks, only last two logs required for restoring complete state Duda Monkey BioCAM RTDS
Real- Time Data Storage cont. Default operaoon: asynch without locks Lock- free algorithms to get and commit transacoon buffers Background threads for log flushing and mirroring Avoids latency and priority inversions Locks will be engaged in overload situaoons Overhead: one RAM copy per object For background traversal, verify state consistency etc HTTP/JSON Duda Monkey BioCAM RTDS
Three companies, one binary! Monkey Sooware Company Duda Monkey Oricane AB BioCAM Xarepo AB RTDS
BioCAM Internal Representa&on Records consists of value fields and class fields Value fields are typically numbers (price, quanoty, temperature etc.) Three types of class fields Explicit: color, brand, country etc. Implicit: Omestamp falling within hour, week, month etc. SyntheSc: favourite assault rifle Class field values are mapped to unsigned integers Master key built by packing class fields into a large unsigned integer Class field 4 Class field 1 Class field 5 Class field 2 Class field 3
Breakdown MulO- branch tree structure Each level corresponds to a unique class field Not all class fields need to be present Branches corresponds to class field values The branches (field values) traversed from root to leaf is called a path Records matching a path are recorded in the corresponding leaf
Breakdown Construc&on For each record a handle is created Each handle contain a reference to the record and a slave key The slave key is an integer representaoon of path where field values from higher levels are stored in more significant bits Array of handles is sorted by increasing slave keys Implicit tree structure is built bo?om up from the sorted array ComputaOonal complexity dominated by sorong!
Aggregates Zero or more aggregates are associated with each breakdown Aggregate values are associated with breakdown nodes and leaves Aggregate funcsons are associated with breakdown levels Leaf aggregate values are computed from value fields in the records using the leaf aggregate funcoon Node aggregate values are computed from childrens aggregate values using the node aggregate funcion Typically only one value field in records is considered Typically aggregate funcoons are idenocal between levels
Example Country: Sweden (S), Finland (F), Denmark (D), Norway (N) Brand: Audi (A), Ford (F), Volvo (V) Color: White (W), Red (R), Blue (B) Breakdown: Brand, Color, Country Aggregate: Sales
Example Brand A F V Color B R W B R W B R W Country D F N S D F N S D F N S D F N S D F N S D F N S D F N S D F N S D F N S Audi White Finland
Tradi&onal Analy&cs in Retail 1 2 1. E- receipts sent to Data Warehouse 2. Analysis of new and historical data 3. Infrequent reports (once per week etc.) 3 Data not relevant to what s happening now involved in the analysis
Real- &me On- the- fly Analy&cs in Retail 2 1 BioCAM Web Service 3 1. E- receipts sent to Data Warehouse 2. E- receipts intercepted/sent in real- Ome to BioCAM WS 3. Analysis performed on- the- fly 4. ReporOng in real- Ome 4 Real- Ome monitoring, analysis and reporong with minimum stress on the data warewouse
Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on 1.500 stores distributed across the globe open 10.00-18.00 15.000 unique products when taking size, color etc. into account Customer purchases an average of 30 random products in each open store every second At peak rate 2.300 customers purchase 45.000 products per second thus surpassing 500.000 USD per second net sales E- receipts are reported immediately to BioCAM Web Service Five different analyses are performed every ten seconds Reports are presented on a dashboard and updated in real- Ome
Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on Almost 1000 billion transacoons since launch whatever.oricane.com
Benchmarks ConfiguraOons: Web Service Access via Web Service front- end Direct access Test program linked with BioCAM, access via C API Stripped Direct access to BioCAM stripped from RTDS Four different data bases sizes (number of records) Six different transacoons loads (records updates per second)
Aggregate Value Re- calcula&on Time 2500-3000 record transacoons per second Re- calculaoon speed not dependent on transacoons/second Measured in milliseconds Web Service Direct Access Stripped 35 31 29 167 153 133 804 711 650 1580 1429 1302
Transac&on Time Web Service Direct Access Stripped Load (x/s) Time (us) Load (x/s) Time (us) Load (x/s) Time (us) 454 1201 407 183 483 144 1463 1824 1246 161 1464 125 2510 2684 2036 143 2275 118 2930 3064 2408 132 2772 109 4568 32150 3414 128 4107 100 5975 235471 4583 120 5742 91
Direct Access
Stripped
Conclusion Aggregate value re- calculaoon cost linear in data base size is expected since the opomized re- calculaoon scheme is not yet implemented TransacOon cost completely dominated by Web Service front- end especially at higher load Would be interesong to bi- pass the web server and run JSON over IP TransacOon cost for Direct Access and stripped decreases with higher load most likely due to reduced context switching and higher cache locality
Key Applica&on Area: Gaming Counter Strike Global Offensive (CSGO) Real- Ome StaOsOcs Site to be launched Currently 150 000 players on- line simultaneously Player base grows exponenoally Partnership with World #1 CSGO team Ninjas in Pyjamas (www.nip.gl) Image source: h?p://www.pcgamer.com/valve- explains- how- csgo- became- the- second- most- played- game- on- steam/
Key Applica&on Area: Energy Oricane is involved in Cloudberry Datacenters (h?p://www.cloudberry- datacenters.com) Focus is on energy savings in data centers - discussions are slow Oricane want to address: Energy producoon Energy trading Embedded applicaoons Looking for a fast paced key partner with lots of data to process Pilot project - value creaoon from ultra high analyocs performance