Big Data Router for Real-Time Analytics

Similar documents
LogInspect 5 Product Features Robust. Dynamic. Unparalleled.

LogPoint 5.1 Product Features Robust. Dynamic. Unparalleled.

Binary search tree with SIMD bandwidth optimization using SSE

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Cassandra on Violin

PTC System Monitor Solution Training

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

NRG Energy Center Minneapolis

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

File System Management

Whitepaper: performance of SqlBulkCopy

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Avid. inews. Redundancy and Failover in Avid News Management Solutions

Persistent Binary Search Trees

High Availability Solutions for the MariaDB and MySQL Database

Hypertable Architecture Overview

Physical Data Organization

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Cloud Based Application Architectures using Smart Computing

Designing a Cloud Storage System

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

Benchmarking Hadoop & HBase on Violin

Eloquence Training What s new in Eloquence B.08.00

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

XProtect Corporate 2013

Welcome to Virtual Developer Day MySQL!

SiteCelerate white paper

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Scaling Graphite Installations


The World s Leading Graph Database

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Raima Database Manager Version 14.0 In-memory Database Engine

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Software-defined Storage Architecture for Analytics Computing

MCTS Guide to Microsoft Windows 7. Chapter 10 Performance Tuning

Couchbase Server Under the Hood

Enterprise Historian 3BUF D1 Version 3.2/1 Hot Fix 1 for Patch 4 Release Notes

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Distribution One Server Requirements

Tushar Joshi Turtle Networks Ltd

Bluetooth in Automotive Applications Lars-Berno Fredriksson, KVASER AB

Tableau Server Scalability Explained

Comparing Scalable NOSQL Databases

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

Analysis of Compression Algorithms for Program Data

Big Data Analytics Using SAP HANA Dynamic Tiering Balaji Krishna SAP Labs SESSION CODE: BI474

HPAM: Hybrid Protocol for Application Level Multicast. Yeo Chai Kiat

How To Understand and Configure Your Network for IntraVUE

Data Storage - II: Efficient Usage & Errors

Performance and Scalability Overview

VMware vcenter Log Insight User's Guide

Promise of Low-Latency Stable Storage for Enterprise Solutions

The Complete Performance Solution for Microsoft SQL Server

WHAT IS ENTERPRISE OPEN SOURCE?

The Quality of Internet Service: AT&T s Global IP Network Performance Measurements

Switching Architectures for Cloud Network Designs

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Pushing the Limits of Windows: Physical Memory Mark Russinovich (From Mark Russinovich Blog)

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

How To Develop A Data Platform For A Database

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version Fix Pack 2.

Price/performance Modern Memory Hierarchy

High availability on the Catalyst Cloud

Demand Attach / Fast-Restart Fileserver

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

[Hadoop, Storm and Couchbase: Faster Big Data]

SQL Server Virtualization

MySQL High-Availability and Scale-Out architectures

SAIP 2012 Performance Engineering

Transaction Monitoring Version for AIX, Linux, and Windows. Reference IBM

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

Load Balancing. Load Balancing 1 / 24

Tap into Hadoop and Other No SQL Sources

Oracle Data Guard OTN Case Study SWEDISH POST

Accelerating Server Storage Performance on Lenovo ThinkServer

White Paper. Optimizing the Performance Of MySQL Cluster

Dave Stokes MySQL Community Manager

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

VMware vcenter Log Insight User's Guide

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Transcription:

Real- &me Analy&cs How it Started

Ba:lefield 3 Player Sta&s&cs EA Collected 50TB/day 2013. Available Player Stats sites: h?p://ba?lelog.ba?lefield.com h?p://bf3stats.com Features per gun/vehicle/class leader boards etc. Geo- leader boards introduced when Ba?lefield 4 was released November 2013. Lacks interesong analysis!

Harvested Player Data from bf3stats.com Roughly 2 million player records Each player record has 1076 fields EffecOvely a spread sheet with 2 billion cells Details: Each player record has a field country. Each player record has fields for all assault rifles: AK- 74, M416, M16, AEK- 971, F2000, FAMAS, AUG- A3, KH- 2002, AN- 94, G3A3, SCAR- L, L85A2

Ques&on For each country & assault rifle: What percent of players have each assault rifle as favorite assault rifle? Bf3stats (MongoDB): >1h BioCAM RAW: 37 milliseconds 7,00 6,00 5,00 4,00 3,00 2,00 1,00 0,00 Log10(milliseconds) 6,56 1,57 Favorite Assault Rifle bf3stats (MongoDB) BioCAM RAW

Extract from the Analysis country_name AK- 74 M416 M16 AEK- 971 F2000 FAMAS AUG A3 KH 2002 AN- 94 G3A3 SCAR- L L85A2 Sweden 12,31% 20,98% 27,32% 19,13% 7,43% 3,65% 2,26% 1,87% 1,20% 2,11% 0,39% 1,34% United States 11,19% 23,68% 25,80% 16,53% 8,05% 4,26% 2,63% 1,71% 1,45% 2,26% 0,62% 1,83% Russian FederaOon 22,95% 12,96% 22,35% 26,44% 6,09% 1,85% 1,85% 1,76% 1,57% 1,18% 0,35% 0,66% France 11,72% 17,02% 33,34% 14,88% 8,79% 6,71% 2,15% 1,79% 0,90% 1,34% 0,35% 1,01% United Kingdom 13,34% 21,40% 26,52% 16,34% 7,68% 4,03% 2,45% 1,65% 1,05% 1,72% 0,43% 3,40% Conclusion: Player have a preference for weapons used by their country s armed forces!

Conclusion Sufficient reporong speed to handle high velocity data flows Fast enough to perform analysis in real- Ome on- the- fly BioCAM Web Service

BioCAM Web Service HTTP/JSON Core BioCAM AnalyOcs Engine Duda Web Services Framework (h?p://duda.io) Monkey Web Server (h?p://monkey- project.com) HTTP(S)/JSON Web Service Interface Create mulople BioCAM instances with different schemes Arbitrarily deep break downs for various kinds of analysis Each break down serves mulople aggregates Drill- downs naovely supported from the Web Service API Duda Monkey BioCAM

RTDS (Real- Time Data Storage) NoSQL graph database to persistently store generic interconnected objects in an applicaoon Linked directly into the applicaoon to store its state Designed for telecom requirements 24/7 always low latency (no maintenance windows!), 1+1 mirroring, fast switchover and failover, upgrades in runome Side- effect: low overhead and energy efficient HTTP/JSON Duda Monkey BioCAM RTDS

Real- Time Data Storage (RTDS) HTTP/JSON Persistent NoSQL graph database Stores generic interconnected objects in an applicaoon Linked directly into the applicaoon to store its state Low overhead Energy efficient Duda Monkey BioCAM RTDS

Real- Time Data Storage cont. HTTP/JSON Designed for telecom requirements 24/7 always low latency No maintenance windows 1+1 mirroring Fast switchover and failover Upgrades in runome Duda Monkey BioCAM RTDS

RTDS Internal Workings HTTP/JSON Data is stored as a transacoon log Proven method, provides atomic transacoons, audit history and correctly ordered updates in hot standby instance Robust in crash scenarios (corrupoon in end of log only) Self- rotaong transacoon log No checkpoinong (as it introduces latency and peaks in CPU/RAM resources) Background object traversal of all objects, writes latest state to log, when complete log is rotated ~1% of CPU, no latency peaks, no resource peaks, only last two logs required for restoring complete state Duda Monkey BioCAM RTDS

Real- Time Data Storage cont. Default operaoon: asynch without locks Lock- free algorithms to get and commit transacoon buffers Background threads for log flushing and mirroring Avoids latency and priority inversions Locks will be engaged in overload situaoons Overhead: one RAM copy per object For background traversal, verify state consistency etc HTTP/JSON Duda Monkey BioCAM RTDS

Three companies, one binary! Monkey Sooware Company Duda Monkey Oricane AB BioCAM Xarepo AB RTDS

BioCAM Internal Representa&on Records consists of value fields and class fields Value fields are typically numbers (price, quanoty, temperature etc.) Three types of class fields Explicit: color, brand, country etc. Implicit: Omestamp falling within hour, week, month etc. SyntheSc: favourite assault rifle Class field values are mapped to unsigned integers Master key built by packing class fields into a large unsigned integer Class field 4 Class field 1 Class field 5 Class field 2 Class field 3

Breakdown MulO- branch tree structure Each level corresponds to a unique class field Not all class fields need to be present Branches corresponds to class field values The branches (field values) traversed from root to leaf is called a path Records matching a path are recorded in the corresponding leaf

Breakdown Construc&on For each record a handle is created Each handle contain a reference to the record and a slave key The slave key is an integer representaoon of path where field values from higher levels are stored in more significant bits Array of handles is sorted by increasing slave keys Implicit tree structure is built bo?om up from the sorted array ComputaOonal complexity dominated by sorong!

Aggregates Zero or more aggregates are associated with each breakdown Aggregate values are associated with breakdown nodes and leaves Aggregate funcsons are associated with breakdown levels Leaf aggregate values are computed from value fields in the records using the leaf aggregate funcoon Node aggregate values are computed from childrens aggregate values using the node aggregate funcion Typically only one value field in records is considered Typically aggregate funcoons are idenocal between levels

Example Country: Sweden (S), Finland (F), Denmark (D), Norway (N) Brand: Audi (A), Ford (F), Volvo (V) Color: White (W), Red (R), Blue (B) Breakdown: Brand, Color, Country Aggregate: Sales

Example Brand A F V Color B R W B R W B R W Country D F N S D F N S D F N S D F N S D F N S D F N S D F N S D F N S D F N S Audi White Finland

Tradi&onal Analy&cs in Retail 1 2 1. E- receipts sent to Data Warehouse 2. Analysis of new and historical data 3. Infrequent reports (once per week etc.) 3 Data not relevant to what s happening now involved in the analysis

Real- &me On- the- fly Analy&cs in Retail 2 1 BioCAM Web Service 3 1. E- receipts sent to Data Warehouse 2. E- receipts intercepted/sent in real- Ome to BioCAM WS 3. Analysis performed on- the- fly 4. ReporOng in real- Ome 4 Real- Ome monitoring, analysis and reporong with minimum stress on the data warewouse

Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on 1.500 stores distributed across the globe open 10.00-18.00 15.000 unique products when taking size, color etc. into account Customer purchases an average of 30 random products in each open store every second At peak rate 2.300 customers purchase 45.000 products per second thus surpassing 500.000 USD per second net sales E- receipts are reported immediately to BioCAM Web Service Five different analyses are performed every ten seconds Reports are presented on a dashboard and updated in real- Ome

Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on Almost 1000 billion transacoons since launch whatever.oricane.com

Benchmarks ConfiguraOons: Web Service Access via Web Service front- end Direct access Test program linked with BioCAM, access via C API Stripped Direct access to BioCAM stripped from RTDS Four different data bases sizes (number of records) Six different transacoons loads (records updates per second)

Aggregate Value Re- calcula&on Time 2500-3000 record transacoons per second Re- calculaoon speed not dependent on transacoons/second Measured in milliseconds Web Service Direct Access Stripped 35 31 29 167 153 133 804 711 650 1580 1429 1302

Transac&on Time Web Service Direct Access Stripped Load (x/s) Time (us) Load (x/s) Time (us) Load (x/s) Time (us) 454 1201 407 183 483 144 1463 1824 1246 161 1464 125 2510 2684 2036 143 2275 118 2930 3064 2408 132 2772 109 4568 32150 3414 128 4107 100 5975 235471 4583 120 5742 91

Direct Access

Stripped

Conclusion Aggregate value re- calculaoon cost linear in data base size is expected since the opomized re- calculaoon scheme is not yet implemented TransacOon cost completely dominated by Web Service front- end especially at higher load Would be interesong to bi- pass the web server and run JSON over IP TransacOon cost for Direct Access and stripped decreases with higher load most likely due to reduced context switching and higher cache locality

Key Applica&on Area: Gaming Counter Strike Global Offensive (CSGO) Real- Ome StaOsOcs Site to be launched Currently 150 000 players on- line simultaneously Player base grows exponenoally Partnership with World #1 CSGO team Ninjas in Pyjamas (www.nip.gl) Image source: h?p://www.pcgamer.com/valve- explains- how- csgo- became- the- second- most- played- game- on- steam/

Key Applica&on Area: Energy Oricane is involved in Cloudberry Datacenters (h?p://www.cloudberry- datacenters.com) Focus is on energy savings in data centers - discussions are slow Oricane want to address: Energy producoon Energy trading Embedded applicaoons Looking for a fast paced key partner with lots of data to process Pilot project - value creaoon from ultra high analyocs performance