Big Data Advanced Analytics for Game Monetization Kimberly Chulis CEO Core Analytics, LLC
Core Analytics / Game Loyalty Bay area and Chicago based digital advanced analytics firm Big Data / NoSQL Advanced Analytics Solutions Focuses on optimization and measurement of video, socialandmobilegamesand games, in game advertising and gamification analytics Micro segmentation, dashboard reporting, data integration, predictive and multivariate analytics
Describe size of Gaming Industry and Exponential Growth 100 Video Games ($Billions) 25 2012 ($Billions) 20 50 15 10 0 2011 2014 5 0 Computer Games Social Games Source: Colin Sebastian for RW Baird While these numbers indicate that the current largest share of market belongs to the video game industry, an important preferential shift and player demographic change is occurring that is expected to result in a partial decline in popularity of core video games and a rapid decline in the console variety.
Describe size of Gaming Industry and Exponential Growth Social games Offered at a fraction of the price through social networks, and are played on various mobile platforms 54% female, Mobile Payments and PayPal for Purchases Across Platforms and Devices Traditional video and MMO games Predominantly male (18 34) Pay cash / credit cards / Solo or within limited interactive environment Mobile games Represent a fundamental change in the gaming landscape Expanding Who, How and Why gamers are playing
Describe size of Gaming Industry and Exponential Growth Stark differences emerging in game revenue patterns across devices ios games generating 85% of in game revenues over Android and other platforms Expected decline in console games (related hardware and apparatus sales) and web based games becoming the go to platforms This Trend will be parallel to continued growth in video games (played on computers) and to explosive growth in mobile games Conclusion Big opportunities for Analytics Providers
Big Data and Social and Online Gaming
NoSQL TECHNOLOGY
Technology
Technology Database technology shift Shift from relational to NoSQL New solutions introduced to handle data on a very large scale Methods of extending the capacity of legacy systems were introduced Sharding Denormalization Distributed caching
Technology Sharding is the practice of data partitioning across diverse servers, which requires knowledge of the server location of data and is limited by the fact that you can't perform joins across shards. You must maintain schemas for each server Denormalization is another method that involves grouping and indexing redundant data and often results in latency and issues with maintaining concurrency in relational database systems Distributed caching which caches recent data in memory, is useful when data is needed. The application (web, game, social network, search engine, andso on) first checks a distributed caching system, such as memcached, for the needed data instead of going back to the relational database
Advantage NoSQL to RDMS RDMS Won t scale after a certain point anymore. System Cost Application Response Time Database scales out add more web servers
Technology Warehousing Options Commercial NoSQLoptions options that integrate with Hadoop and other open source tools and greatly extend these capabilities with analytics, text mining, in application processing, map reduce functions, and graphing options IBM InfoSpere BigInsights Cloudera (Yearly Support Fee for the Enterprise Edition)
NoSQL Software Types Document Document & Key Value Document (JSON) Graph hdtb Database Key value Key value & Hierarchical & Document Key value Multi level
NoSQL Software Source: Wikipedia
NoSQL Software Source: Wikipedia
Export Social Media Gaming Data from MySQL into Hadoop JSON Parses and store results into table MySQL in the Cloud Others like Forums / Blogs, HDFS Sqoop Client
Twitter stored in MySQL
Facebook stored in MySQL
Import Process MySQL Table Metadata Launch Sqoop Client MAP MAP MAP Generated Record Container Class Uses HDFS HDFS HDFS MapReduce Job
Migrate from MySQL to Hadoop Using Sqoop to migration fro MySQL to Hadoop Results: Retrieve the Results from Hadoop
Hadoop HDFS FORMAT CSV: ID, Search term, User name, Created date,,., Mention, Language
Real time Analytics Storm and Hadoop Hadoop run all functions all at once, but high latency Storm run incremental functions all quickly distributed dand fault tolerant l real time computation system Guarantees data will be processed Horizontally scalable Storm Concepts (see also next slide) Tuple: Named list of values Streams: Unbounded sequence of tuples Spouts: Sourceofstreams of (i.e. Twitter) Bolts: Bringing the tuples together (functions, filters, joins, talking to databases) Topology: How is everything linked together (graph of computations)
Storm Concepts Stream Tuple Tuple Tuple Spouts Tuple Tuple Tuple (Multiple Streams) Tuple Tuple Tuple Tuple Tuple Tuple Bolt Tuple Tuple Tuple Tuple Tuple Tuple Spout Topology Bolt Bolt Bolt Spout Bolt Bolt
Storm and Hadoop Integration Example: very large amount of data that needs to be parsed from different sources at real time. Implement Hadoop Create a Map/Reduce Job, but. Long time to be completed as it takes the slowest Reducer to be completed This means unknown cycle completion times
Storm and Hadoop Integration Another complexity could be We have a number of processes updating our feeds, and would like to control which will update the database? We should ask ourselves: is there a system that allows us to deploy the Queue / Worker system without inheriting thecomplexity? Yes it s called Storm
Storm Cluster Zookeeper Supervisor Master Node (similar to Hadoop Job Tracker) Nimbus Zookeeper Supervisor Supervisor Run Worker Processes Zookeeper Supervisor Zookeeper Zookeeper is used for Cluster coordination Supervisor
Storm and Hadoop Integration Storm allows us to write real time topologies We don t have to worry about Scalability Fail over IPC Real time Analytics!
Technology Filtering Hadoop Freely licensed software framework developed by the Apache Software Foundation Scale from a single computer up to thousands computers Hadoop Distributed File System (HDFS) Designed for storing very large files Streaming data access patterns Running on clusters of commodity hardware
Technology Filtering Hadoop Source: Apache Hadoop
Technology Filtering Hadoop Functional layer Modeling and development Storage and data management Sub project MapReduce, Pig, Mahout HDFS, Hbase, Cassandra Dt Data warehousing, summarization, Hive, Scoop query Data collection, aggregation, analysis Chukwa, Flume Metadata, table and schema management Cluster management,,job scheduling, workflow Data serialization Hcatalog Zookeeper, Oozie, Ambari Avro
Technology Filtering Hadoop Vendor Amazon Web Services Datameer EMC Greenplum Hstreaming MapR Pentaho Zettaset Vendor Cloudera Datastax Hortonworks IBM Outerthought Platform Computing
Technology Filtering MapReduce MapReduce is a programming framework popularized by Google and used to simplify data processing across massive data sets MapReduce1 Classic Framework MapReduce2 YARN (Yet Another Resource Negotiator) Very clusters in the region of 4000 or higher
Technology Filtering MapReduce MapReduce Map 1 Start Map 2 Reduce End Map N
Technology Filtering MapReduce1 Source: Hadoop The Definitive Guide
Technology Filtering MapReduce2 Source: Apache Hadoop
Technology Open Source Analytics There are various open source applications available Mahout R
R Open source R has been integrated to run massively parallel statistical processes directly in Hadoop nodes Functionality Used for statistical computing and graphics Linear and nonlinear modeling Classical statistical tests Time series analysis Classification Clustering Well design publication quality plots Runs on Unix and similarplatforms platforms, Windows and MacOS
Mahout Open source analytics tool on top of systems, such as Hadoop s MapReduce paradigm Apache Mahout is a machine learning engine that provides classification, clustering, and collaborative filtering
Technology Analytics Gap Commercial Analytics Packages like SPSS and SAS Raw data Processed dt data
Video and Social Games PRACTICAL EXAMPLES
Game Analytics Vendors Kontagent, Flurry, Mixpanel, Totango, Claritics, Google analytics Fewer vendors focusing on computer and MMO games No single Analytics Provider appears to focus on dli delivering i cross game platform ltf analytics lti
Game Analytics Metrics of social games Daily active users (DAU) Monthly active users (MAU) DAU / MAY ratio Engagement Measurers time spent playing a game K factor Infection rate of viral game growth as the core and casual player base expands Average revenue per user (ARPU) Lifetime Value (LTF) Capture s a player s value to the game based on in game purchases and other monetization related behaviors, player influence on virality, and net game envangelism
Game Monetization
Video Game Subscription Imagine a large MMO Video Game World of Warcraft Star Wars the Old Republic
Video Game Subscription i Model Pay to play Free to play Freemium Players must pay a monthly subscription fee Usually involves an upfront software cost but no additional payments Allows players to access game content and play for free but offers options to pay for additional content and access
Video Game Play to play Game analytics is focused on understanding who the most valuable players areandand how they play Propensity modeling Identify those players withthe the highest propensity to do one of the following Continue a subscription Return to play a game after a subscription i pause Encourage new players to subscribe Become skilled and persuasive guild leaders
Video Game Data elements include traditional game time dashboard key performance indicators (KPIs KPI Time to complete levels Avatar selection Gender of avatar Game related tweet Language KPI Solo vs interactive behaviors Interaction style indicators Game strategy behavior variables Socialnet network activityit
Video Game Micro segmentation Involves segmenting a player base to understand distinct segment preferences and behaviors to guide targeted game design, localization that reflects preferences of regional segments, and appealing targeted extension packages and additional content design
Social Game Ad and Virtual Good Early adaptors of Monetization Big Data Technology Cloud computing solutions Data mining applications Player analytics Allows social game studios to understand in realtime why users are abandoning a game and identify other players at risk of leaving the game so they can develop player retention strategies before those players quit.
Social Game Ad and Virtual Good Monetization
Where the Game Industry is Going
BrandMeter & Gaming Data PRACTICAL IMPLEMENTATION
Process flow Raw Social Media Comment Files, Web log data, External Files API and Data Integration Processes BrandMeter Data Warehouse BrandMeter Segmentation, Predictive Modeling Algorithms Analytic Data Mart Text Mining and Derived Variable, Data Cleansing and Manipulation BrandMeter Processes BrandMeter Social Media Data Mart BrandMeter Dashboard Reporting
Cloud Architecture Dashboard Reporting Analytics Segmentation Datamart BrandMeter Demographics Personality Mood Loyalty Database Datamart MySQL Geographic Other NoSQL Social Media Web Mobile
Social Media Data source: Facebook/Twitter 1. Search request to Facebook and Twitter http://search.twitter.com/search.json?q=kroger https://graph.facebook.com/search?q=kroger&type=post p / g p 2. The app receive the results in a JSON format
Social Media Data source: Facebook/Twitter 3. The app calls a JSON library that parses the data 4. For each JSON record the data will be split up in several field like (Twitter) From_User Created_At Mention (actually message content) 5. Stores the record into MySQL DB
Social Media Data Source: Social Media BrandMeter Datamart JSON Parses and store results into table NoSQL MySQL Integration
Screen shots of the Dashboard / NoSQL / MySQL Data / Analytics Jact Media LLC.
Gaming data provided by Jact Media LLC LIVE DEMO BRANDMETER DASHBOARD