Big Data Advanced Analytics for Game Monetization. Kimberly Chulis



Similar documents
#TalendSandbox for Big Data

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

How To Scale Out Of A Nosql Database

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Data processing goes big

Comprehensive Analytics on the Hortonworks Data Platform

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Implement Hadoop jobs to extract business value from large and varied data sets

Workshop on Hadoop with Big Data

Hadoop Ecosystem B Y R A H I M A.

Real-time Big Data Analytics with Storm

Bringing Big Data to People

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Big Data and Industrial Internet

Peers Techno log ies Pv t. L td. HADOOP

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Hadoop. Sunday, November 25, 12

Constructing a Data Lake: Hadoop and Oracle Database United!

Delivering Intelligence to Publishers Through Big Data

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Play with Big Data on the Shoulders of Open Source

HADOOP. Revised 10/19/2015

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

White Paper: Hadoop for Intelligence Analysis

Big Data and Data Science: Behind the Buzz Words

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

ANALYTICS CENTER LEARNING PROGRAM

HDP Hadoop From concept to deployment.

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Chase Wu New Jersey Ins0tute of Technology

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

BIG DATA What it is and how to use?

Architectures for massive data management

Dominik Wagenknecht Accenture

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Transforming the Telecoms Business using Big Data and Analytics

Hadoop implementation of MapReduce computational model. Ján Vaňo

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

How to Hadoop Without the Worry: Protecting Big Data at Scale

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Apache Hadoop: The Big Data Refinery

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Luncheon Webinar Series May 13, 2013

Hadoop & Spark Using Amazon EMR

Big Data Management and Security

Big data for the Masses The Unique Challenge of Big Data Integration

HDP Enabling the Modern Data Architecture

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Application Development. A Paradigm Shift

Understanding How Sensage Compares/Contrasts with Hadoop

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Introduction to Big Data Training

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Deploying Hadoop with Manager

Search and Real-Time Analytics on Big Data

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Qsoft Inc

The Future of Data Management with Hadoop and the Enterprise Data Hub

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hadoop and Map-Reduce. Swati Gore

Modernizing Your Data Warehouse for Hadoop

The Inside Scoop on Hadoop

Big Data and Hadoop for the Executive A Reference Guide

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Apache Hadoop: Past, Present, and Future

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

How To Use Big Data For Telco (For A Telco)

Performance and Scalability Overview

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Hadoop IST 734 SS CHUNG

Information Builders Mission & Value Proposition

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

CSE-E5430 Scalable Cloud Computing Lecture 2

MapReduce with Apache Hadoop Analysing Big Data

How To Handle Big Data With A Data Scientist

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Big Data on Microsoft Platform

BIRT in the World of Big Data

Talend Big Data. Delivering instant value from all your data. Talend

Transcription:

Big Data Advanced Analytics for Game Monetization Kimberly Chulis CEO Core Analytics, LLC

Core Analytics / Game Loyalty Bay area and Chicago based digital advanced analytics firm Big Data / NoSQL Advanced Analytics Solutions Focuses on optimization and measurement of video, socialandmobilegamesand games, in game advertising and gamification analytics Micro segmentation, dashboard reporting, data integration, predictive and multivariate analytics

Describe size of Gaming Industry and Exponential Growth 100 Video Games ($Billions) 25 2012 ($Billions) 20 50 15 10 0 2011 2014 5 0 Computer Games Social Games Source: Colin Sebastian for RW Baird While these numbers indicate that the current largest share of market belongs to the video game industry, an important preferential shift and player demographic change is occurring that is expected to result in a partial decline in popularity of core video games and a rapid decline in the console variety.

Describe size of Gaming Industry and Exponential Growth Social games Offered at a fraction of the price through social networks, and are played on various mobile platforms 54% female, Mobile Payments and PayPal for Purchases Across Platforms and Devices Traditional video and MMO games Predominantly male (18 34) Pay cash / credit cards / Solo or within limited interactive environment Mobile games Represent a fundamental change in the gaming landscape Expanding Who, How and Why gamers are playing

Describe size of Gaming Industry and Exponential Growth Stark differences emerging in game revenue patterns across devices ios games generating 85% of in game revenues over Android and other platforms Expected decline in console games (related hardware and apparatus sales) and web based games becoming the go to platforms This Trend will be parallel to continued growth in video games (played on computers) and to explosive growth in mobile games Conclusion Big opportunities for Analytics Providers

Big Data and Social and Online Gaming

NoSQL TECHNOLOGY

Technology

Technology Database technology shift Shift from relational to NoSQL New solutions introduced to handle data on a very large scale Methods of extending the capacity of legacy systems were introduced Sharding Denormalization Distributed caching

Technology Sharding is the practice of data partitioning across diverse servers, which requires knowledge of the server location of data and is limited by the fact that you can't perform joins across shards. You must maintain schemas for each server Denormalization is another method that involves grouping and indexing redundant data and often results in latency and issues with maintaining concurrency in relational database systems Distributed caching which caches recent data in memory, is useful when data is needed. The application (web, game, social network, search engine, andso on) first checks a distributed caching system, such as memcached, for the needed data instead of going back to the relational database

Advantage NoSQL to RDMS RDMS Won t scale after a certain point anymore. System Cost Application Response Time Database scales out add more web servers

Technology Warehousing Options Commercial NoSQLoptions options that integrate with Hadoop and other open source tools and greatly extend these capabilities with analytics, text mining, in application processing, map reduce functions, and graphing options IBM InfoSpere BigInsights Cloudera (Yearly Support Fee for the Enterprise Edition)

NoSQL Software Types Document Document & Key Value Document (JSON) Graph hdtb Database Key value Key value & Hierarchical & Document Key value Multi level

NoSQL Software Source: Wikipedia

NoSQL Software Source: Wikipedia

Export Social Media Gaming Data from MySQL into Hadoop JSON Parses and store results into table MySQL in the Cloud Others like Forums / Blogs, HDFS Sqoop Client

Twitter stored in MySQL

Facebook stored in MySQL

Import Process MySQL Table Metadata Launch Sqoop Client MAP MAP MAP Generated Record Container Class Uses HDFS HDFS HDFS MapReduce Job

Migrate from MySQL to Hadoop Using Sqoop to migration fro MySQL to Hadoop Results: Retrieve the Results from Hadoop

Hadoop HDFS FORMAT CSV: ID, Search term, User name, Created date,,., Mention, Language

Real time Analytics Storm and Hadoop Hadoop run all functions all at once, but high latency Storm run incremental functions all quickly distributed dand fault tolerant l real time computation system Guarantees data will be processed Horizontally scalable Storm Concepts (see also next slide) Tuple: Named list of values Streams: Unbounded sequence of tuples Spouts: Sourceofstreams of (i.e. Twitter) Bolts: Bringing the tuples together (functions, filters, joins, talking to databases) Topology: How is everything linked together (graph of computations)

Storm Concepts Stream Tuple Tuple Tuple Spouts Tuple Tuple Tuple (Multiple Streams) Tuple Tuple Tuple Tuple Tuple Tuple Bolt Tuple Tuple Tuple Tuple Tuple Tuple Spout Topology Bolt Bolt Bolt Spout Bolt Bolt

Storm and Hadoop Integration Example: very large amount of data that needs to be parsed from different sources at real time. Implement Hadoop Create a Map/Reduce Job, but. Long time to be completed as it takes the slowest Reducer to be completed This means unknown cycle completion times

Storm and Hadoop Integration Another complexity could be We have a number of processes updating our feeds, and would like to control which will update the database? We should ask ourselves: is there a system that allows us to deploy the Queue / Worker system without inheriting thecomplexity? Yes it s called Storm

Storm Cluster Zookeeper Supervisor Master Node (similar to Hadoop Job Tracker) Nimbus Zookeeper Supervisor Supervisor Run Worker Processes Zookeeper Supervisor Zookeeper Zookeeper is used for Cluster coordination Supervisor

Storm and Hadoop Integration Storm allows us to write real time topologies We don t have to worry about Scalability Fail over IPC Real time Analytics!

Technology Filtering Hadoop Freely licensed software framework developed by the Apache Software Foundation Scale from a single computer up to thousands computers Hadoop Distributed File System (HDFS) Designed for storing very large files Streaming data access patterns Running on clusters of commodity hardware

Technology Filtering Hadoop Source: Apache Hadoop

Technology Filtering Hadoop Functional layer Modeling and development Storage and data management Sub project MapReduce, Pig, Mahout HDFS, Hbase, Cassandra Dt Data warehousing, summarization, Hive, Scoop query Data collection, aggregation, analysis Chukwa, Flume Metadata, table and schema management Cluster management,,job scheduling, workflow Data serialization Hcatalog Zookeeper, Oozie, Ambari Avro

Technology Filtering Hadoop Vendor Amazon Web Services Datameer EMC Greenplum Hstreaming MapR Pentaho Zettaset Vendor Cloudera Datastax Hortonworks IBM Outerthought Platform Computing

Technology Filtering MapReduce MapReduce is a programming framework popularized by Google and used to simplify data processing across massive data sets MapReduce1 Classic Framework MapReduce2 YARN (Yet Another Resource Negotiator) Very clusters in the region of 4000 or higher

Technology Filtering MapReduce MapReduce Map 1 Start Map 2 Reduce End Map N

Technology Filtering MapReduce1 Source: Hadoop The Definitive Guide

Technology Filtering MapReduce2 Source: Apache Hadoop

Technology Open Source Analytics There are various open source applications available Mahout R

R Open source R has been integrated to run massively parallel statistical processes directly in Hadoop nodes Functionality Used for statistical computing and graphics Linear and nonlinear modeling Classical statistical tests Time series analysis Classification Clustering Well design publication quality plots Runs on Unix and similarplatforms platforms, Windows and MacOS

Mahout Open source analytics tool on top of systems, such as Hadoop s MapReduce paradigm Apache Mahout is a machine learning engine that provides classification, clustering, and collaborative filtering

Technology Analytics Gap Commercial Analytics Packages like SPSS and SAS Raw data Processed dt data

Video and Social Games PRACTICAL EXAMPLES

Game Analytics Vendors Kontagent, Flurry, Mixpanel, Totango, Claritics, Google analytics Fewer vendors focusing on computer and MMO games No single Analytics Provider appears to focus on dli delivering i cross game platform ltf analytics lti

Game Analytics Metrics of social games Daily active users (DAU) Monthly active users (MAU) DAU / MAY ratio Engagement Measurers time spent playing a game K factor Infection rate of viral game growth as the core and casual player base expands Average revenue per user (ARPU) Lifetime Value (LTF) Capture s a player s value to the game based on in game purchases and other monetization related behaviors, player influence on virality, and net game envangelism

Game Monetization

Video Game Subscription Imagine a large MMO Video Game World of Warcraft Star Wars the Old Republic

Video Game Subscription i Model Pay to play Free to play Freemium Players must pay a monthly subscription fee Usually involves an upfront software cost but no additional payments Allows players to access game content and play for free but offers options to pay for additional content and access

Video Game Play to play Game analytics is focused on understanding who the most valuable players areandand how they play Propensity modeling Identify those players withthe the highest propensity to do one of the following Continue a subscription Return to play a game after a subscription i pause Encourage new players to subscribe Become skilled and persuasive guild leaders

Video Game Data elements include traditional game time dashboard key performance indicators (KPIs KPI Time to complete levels Avatar selection Gender of avatar Game related tweet Language KPI Solo vs interactive behaviors Interaction style indicators Game strategy behavior variables Socialnet network activityit

Video Game Micro segmentation Involves segmenting a player base to understand distinct segment preferences and behaviors to guide targeted game design, localization that reflects preferences of regional segments, and appealing targeted extension packages and additional content design

Social Game Ad and Virtual Good Early adaptors of Monetization Big Data Technology Cloud computing solutions Data mining applications Player analytics Allows social game studios to understand in realtime why users are abandoning a game and identify other players at risk of leaving the game so they can develop player retention strategies before those players quit.

Social Game Ad and Virtual Good Monetization

Where the Game Industry is Going

BrandMeter & Gaming Data PRACTICAL IMPLEMENTATION

Process flow Raw Social Media Comment Files, Web log data, External Files API and Data Integration Processes BrandMeter Data Warehouse BrandMeter Segmentation, Predictive Modeling Algorithms Analytic Data Mart Text Mining and Derived Variable, Data Cleansing and Manipulation BrandMeter Processes BrandMeter Social Media Data Mart BrandMeter Dashboard Reporting

Cloud Architecture Dashboard Reporting Analytics Segmentation Datamart BrandMeter Demographics Personality Mood Loyalty Database Datamart MySQL Geographic Other NoSQL Social Media Web Mobile

Social Media Data source: Facebook/Twitter 1. Search request to Facebook and Twitter http://search.twitter.com/search.json?q=kroger https://graph.facebook.com/search?q=kroger&type=post p / g p 2. The app receive the results in a JSON format

Social Media Data source: Facebook/Twitter 3. The app calls a JSON library that parses the data 4. For each JSON record the data will be split up in several field like (Twitter) From_User Created_At Mention (actually message content) 5. Stores the record into MySQL DB

Social Media Data Source: Social Media BrandMeter Datamart JSON Parses and store results into table NoSQL MySQL Integration

Screen shots of the Dashboard / NoSQL / MySQL Data / Analytics Jact Media LLC.

Gaming data provided by Jact Media LLC LIVE DEMO BRANDMETER DASHBOARD