Zynga Analytics Leveraging Big Data to Make Games More Fun and Social



Similar documents
the missing log collector Treasure Data, Inc. Muga Nishizawa

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

From Spark to Ignition:

Big Data Analytics Nokia

Cost-Effective Business Intelligence with Red Hat and Open Source

Big Data Technologies Compared June 2014

Oracle Big Data SQL Technical Update

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Ganzheitliches Datenmanagement

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Performance and Scalability Overview

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

The 3 questions to ask yourself about BIG DATA

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

How To Handle Big Data With A Data Scientist

Cloud Big Data Architectures

Performance and Scalability Overview

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

KNIME & Avira, or how I ve learned to love Big Data

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

SAP and Hortonworks Reference Architecture

I/O Considerations in Big Data Analytics

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

So What s the Big Deal?

Database Scalability and Oracle 12c

Apache Kylin Introduction Dec 8,

Next-Generation Cloud Analytics with Amazon Redshift

Parallel Data Warehouse

HDP Hadoop From concept to deployment.

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Big Data for Investment Research Management

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Database Scalability {Patterns} / Robert Treat

Architecting for the Internet of Things & Big Data

Putting Apache Kafka to Use!

May 2015 Robert Gibbon & Jochen Stroobants

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Tableau Visual Intelligence Platform Rapid Fire Analytics for Everyone Everywhere

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

CitusDB Architecture for Real-Time Big Data

Big data blue print for cloud architecture

Moving From Hadoop to Spark

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Defined Introducing DataStack 3.0

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Partner Camp Leistungsstarkes Log-Management für physische, virtuelle und cloud-basierte Umgebungen. Tomas Baublys

The Enterprise Data Hub and The Modern Information Architecture

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

[Hadoop, Storm and Couchbase: Faster Big Data]

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Cisco IT Hadoop Journey

Real Time Big Data Processing

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

GigaSpaces Real-Time Analytics for Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Il mondo dei DB Cambia : Tecnologie e opportunita`

A Scalable Data Transformation Framework using the Hadoop Ecosystem

BI/Analytics for NoSQL: Review of Architectures

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Big Data Infrastructure at Spotify

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Splice Machine: SQL-on-Hadoop Evaluation Guide

Presenters: Luke Dougherty & Steve Crabb

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

In-memory computing with SAP HANA

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

The Future of Data Management

Modern Data Warehouse

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Fusion Applications Overview of Business Intelligence and Reporting components

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Transcription:

Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering

World s leading social game developer And growing rapidly web and mobile 3 rd party games on the Zynga Platform

Built on global platforms, and our own Zynga API (services) and platform Play with your friends Synchronous or Asynchronous play Cooperative or Competitive play

By The Numbers Users Game Data Server Data ~260 million MAUs ~60 million avg DAUs worldwide Vertica driven ~60 billion rows/day ~10TB daily semistructured data ~1.5PB source data Largest 230 2U nodes Splunk 13TB per day raw logs from server and app logs Vertica or Hadoop for archives

Starting Core Concepts What helped made Analytics successful at Zynga

Metrics Driven Culture Management desire to track goal progress by metrics

Analytics Everywhere Wanted open data access as much as possible Freely accessible reports by everyone Open Ad-hoc SQL access Easy external service integration

Ease of Use and Integration Wanted easy/standard tool integration ETL/ELT tools Analysis tools/db visualizers Reporting External service integration via SQL Control data structure at moment logged via an API Semi-structured data capture for flexibility Centralized data schemas for easier analysis

Organizational Structure Centralized Data/BI (centralized data schema and aggregation) Data Infrastructure (centralized data flow) Network level Data Analysts Centralized but embedded Game and partner group Data Analysts Schema, architecture and data knowledge Share insights company wide

Art + Science, not Art vs Science Art: Generate the game idea and implement Science: Find out if it s good/fun. Listen to the players.

SERVICES AND ARCHITECTURE

DATABASES IN USE Vertica: primary game/user analytics stores. 10TB/day,70 billion rows/day Splunk: primary log analytics stores 13TB/day MySql Cluster 7.2x: streaming event DB. 70 nodes,650million rows/day MySql: many single node and sharded transactional DB s Membase: memory store with persistence. Replaced memcache+mysql. HBase: Messaging service store. Disk>memory Memcache,Memqueue: Service stores when persistence isn t needed. SOLR: Run-time text search needs Redis: Service stores with ranged queries. Oracle: Finance Memsql: being looked at

ZTRACK API Simple to use logging API PHP, Java and Ruby REST API as part of Zynga API Backend Leverages: FB Scribe for scalable, fast worldwide message forwarding Custom Java ETL database loaders Semi-structured data logging (flexible taxonomy) PHP example ztrack_count($user_id, "myevent", $value1, kingdom", phylum", class", family", genus");

Vertica Data Warehouses MPP Compressed Column Store, Full ANSI SQL 6-9x compression on data, extremely fast bulk loading Stats >60Billion rows/day, trickle-in/real-time from ZTrack >10TB/day Largest is 230 2U nodes, next generation will be 560 to 1,000 Clusters Production and Mirror, Social Graph, Sample, Virtual Goods tracking for revenue recognition/sox, Poker hands, International, Test and Staging

Reporting and Analysis stats.zynga.com Over 6,000 distinct report types ~1080 DAU, ~1,480 WAU ~3,000 report runs per day, 500-600 distinct reports each day ~15,000 ad-hoc queries from users per day Taxonomy slicer reports Ad-hoc SQL access clusters for analysis Analysts, Product Managers, Engineers and BI team work to create new insights, metrics and profiles and operationalize

Data Services Allows for run-time decisions in game or services API backed by fast in-memory data access (membase) Network level data across data centers using membase sync PHP and Java and REST API as part of Zynga API Access to real-time and daily aggregated user and game data, network level Some uses include: Personalization Targeting Profiling Matchmaking

Experiment Platform, A/B Testing Provides real-time: Controlled Experimentation via web UI and game hooks Reporting Simple API Java, PHP and REST API Impact Game and Platform Design Ability to see what happened in real time Lots of experiments. Many fail. ~3-5K active at any time atm.

Real-Time Streaming Data Events Real-time scalable event aggregation, using time windows Presently handles over 70 billion events per day Technology: Custom java for processing Memcaches Memqueues ( enumerated memcahe) MySql Cluster 7.2x: ~70 nodes, 624mil rows/day, 300k query/day Please see Michael Fan and Rushan Chen s lightning talk on Streaming later today

Streaming Architecture

Streaming Uses Operation health monitoring/reporting and alerting Fast key metric reporting, offloading from Vertica Data Validation Compare to Vertica Future: more timely run-time decisions in-game Please see Michael Fan and Rushan Chen s lightning talk on Streaming later today

Analytics Maturity at Zynga II: What Happened? III: Why Did it Happen? IV: Create Advantage I: Capture Events Transactional systems Reporting, Dashboards and Alerts Analysis, Experiments and Mining Analytics Driven Execution and Innovation via Data Services, new Game Features, Guessing Awareness Understanding Action

APPENDIX/SUMMARY

Services Summary Centralized network level tracking, reporting and warehouses Embedded analysts as a service Centralized network analysts, BI and data infrastructure Data Services Run-time data access Experiment service Easy A/B testing Streaming event service real-time scalable event aggregation

Zynga Core Concepts Summary Commitment to a metric driven mindset Open Access to data reports, ad-hoc and external services Ease of use tools, external services, schemas Art + Science Experiment on ideas, analyze and make changes. Use analytics to listen to the players and make changes Log with an API, add some structure data as possible at moment logged Standardize taxonomies quickly and enforce once mature

Connecting the World Through Games