Big Data Analytics for Social and Behavioral Sciences Jaideep Srivastava Professor Co-Founder, CTO Computer Science Ninja Metrics, Inc. University of Minnesota jaideep@ninjametrics.com srivasta@cs.umn.edu www.ninjametrics.com CRIS Symposium May 2 nd, 2013
Talk Outline Examples of social/behavioral big data Why study virtual worlds and games What social/behavioral sciences tell us Impact on Science Dynamics of online trust Impact on Business Loyalty and influence in CRM So what does Ninja Metrics do? Concluding remarks
Examples of Social/Behavioral Big Data
Example: Tweets for Japanese Tsunami Original Retweet Global retweets of Tweets coming from Japan for one hour after the earthquake
Example: Churn in Subscription Games Ratio of Quitters to Stayers 6.00 5.00 4.00 3.00 2.00 Social network Likelihood of quitting Solo players 65.3% Connected to small or medium networks 34.8% In the biggest network 5.7% Solo Social 1.00 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 Character Level Isolated players are 3.5x more likely to quit (B = 1.26, p<.001). Focus design on facilitating social interaction.
Levis Example of Social Retail Levis leverages its brand to ensure customers provide their social network Levis can leverage predictive social analytics technology to understand the value of the customer s social network 6
Opportunity, Innovation, Impact Companies do not understand the social graph of their customers It s not just about how they relate to their customers, but also about how customers relate to each other vs. Understanding these relationships unlocks immense value Innovation: Understanding the social network of customers Key influencers, relationship strength, Impact: Deriving actionable insights from this understanding Customer acquisition, retention, customer care, Social recommendation, influence-based marketing, identifying trend-setters, Ninja Metrics confidential information. Copyright 2012 7
Why Study Games & Virtual Worlds?
Player Behavior & Revenue Model Blizzard (subscription) World of Warcraft 12 million subscribers Revenue model $15/month Approx $3billion annual revenue 4 hours a day, 7 days a week! Hard core gamers Less socially acceptable Like Cocaine Zynga (free2play) Farmville, Fishville, Mafia Wars, etc. 180 million players Revenue model Virtual goods $700 million in 2010 0.5 hrs a day, 7 days a week Everyone More socially acceptable Like Caffeine
MMORPG Data Sets MMORPG: Massively Multiplayer Online Role Playing Games People assume characters in a fantasy world On average, each players spend 22 hours a week World-of-Warcraft has 10 million subscribers as of Feb 2012 MMORPG is $20 Billion industry Several in-game relationships: chat, trade, mentor, and housing. Helpful to understand the social processes underlying in the society
EQ2 Data Set Chat means to communicate in-game messages and invitations with other players Nodes 349,654; Edges 86,948,748; Period 1 Month Trade means to exchange, buy or sell weapons, and other in-game items Nodes 295,055 Edges 28,594,929; Period 9 Months Mentoring means to assist lower level players to increase mentors experience points Nodes 86,495 Edges 11,913,994; Period 9 Months Housing Trust means to accumulate and store in-game items; share house with the in-game partner to allow the storing of in-game items Nodes 63,918 Edges 128,048; Period 9 Months
Multiple Networks in an Online Game Black: male Red: female Partnership Instant messaging Trade Mail
One Day Snapshot of Various Networks Node color represents the community in all graphs Chat [k-core=7] Trade and Chat networks are filtered by k-core Trade [k-core=2] Mentoring Housing
In-game relationships CHAT Period of interaction: Instantaneous Level of trust: low Graph Density High TRADE Period of interaction: Instantaneous Level of trust: medium MENTORING Period of interaction: long Level of trust: high HOUSING Period of interaction: long Level of trust: very high Low
Degree Distribution for Various Networks ING
EQ2 Graph Summary Mentor Network Trading Graph Housing Graph Chat Graph Period: 9 Months (01-JAN-2006 to 11-SEP-2006) Number of Nodes : 42451 (43K) Number of Edges : 28594929 (29M) Directed: Yes, Temporal: Yes Period: 9 Months (01-JAN-2006 to 11-SEP-2006) Number of Nodes : 54287 (55K) Number of Edges : 1045521695 (1B) Directed: Yes, Temporal: Yes Period: 9 Months (01-JAN-2006 to 11-SEP-2006) Number of Nodes : 62427 (63K) Number of Edges : 1962734099 (2B) Directed: Yes, Temporal: Yes Period: 1 Months 10 days (29-JUL-2006 to 10-SEP- 2006) Number of Nodes: 349654 (350K) Number of Edges: 86948748 (87M) Directed: Yes, Temporal: Yes Granularity of each network is in second)
Friendship Graph Mentoring Graph Chat Graph Quest Graph CR3 Graph Summary Period: 5 Months (08-MAY-2010 to 30-SEP-2010) Number of nodes: 86614 (87 K) Number of edges: 1560303 (1.5M) Directed: Yes, Temporal: Yes Period: 5 Months (08-MAY-2010 to 30-SEP-2010) Number of nodes: 64003 (64K) Number of edges: 188002 (188K) Directed: Yes, Temporal: Yes Period: 1 Months (11-Oct-2010 to 09-NOV-2010) Number of Nodes: 11830 (12K) Number of Edges: 107382408 (107M) Directed: Yes, Temporal: Yes Period: 1 Month Number of nodes: 53836 (54K) Number of edges: 5521156 (5.6M) Directed: No, Temporal: Yes Granularity of each network is in second)
EVE Graph Summary Transaction Log Period: 02/25/2011 05/26/2011 No of nodes =5000 No of edges= 4,975,181 Directed: yes, temporal =yes Granularity in minutes Email Log Period: 05/06/2003 07/06/2011 No of nodes= 5391685 NO of edges=40,680,105 Granularity in minutes
What Social & Behavioral Sciences Tell Us?
History of Social Network Analysis Anthropology Organizational Theory Social Psychology Perception Socio-Cognitive Networks Cognitive Knowledge Networks Reality Social Networks Knowledge Networks Epidemiology Acquaintance (links) Knowledge (content) Sociology Social science networks have widespread application in various fields Most of the analyses techniques have come from Sociology, Statistics and Mathematics See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis
Why do we create and sustain networks? Theories of self-interest Theories of social and resource exchange Theories of mutual interest and collective action Theories of contagion Theories of balance Theories of homophily Theories of proximity Theories of co-evolution Sources: Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review. Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press.
Structural signatures of Social Theories A A B + F B + - F C - E C E D D Self interest Exchange Balance A A A B + F B C + - E F B C - + E F C D Collective Action E G o v e rn m e n t In d u s try D Homophily Novice Expert D Contagion
Application Successes Numerous in social sciences Google PageRank LinkedIn expanding your Cognitive Social Network making you aware that you re more connected and closer than you think you are Expertise discovery in organizations Knowledge experts, authorities Well-connected individuals, hubs Rapid-response teams in emergency management Information flow in organizations Twitter real time information dissemination Etc.
Impact on Science: Dynamics of Online Trust
Trust Relationship All players can carry only limited number of items at a time Player buys a house to store excess in-game items House is shared with a in-game partner until the owner revokes the permission to house There are several levels of permission of access TRUSTEE The partner can enter, store and move items in and out of the house FRIEND The partner can enter, store and move his items only VISITOR The partner can enter and see the house NONE The partner can see the house from outside REMOVE The partner cannot see the house Do players prefer a specific trust level? Is there any stable trust level? Do players express higher trust level quickly compared to lower?
Trust Dynamics BEG REMOVE NONE VISITOR FRIEND TRUSTEE END BEG REMOVE NONE VISITOR FRIEND TRUSTEE END 1. Frequency of Expression: People express stronger relationships more often than weaker relationships. See total count of the upper triangular part compared to the lower. 2. Stability of Trust: Trustee state is predominantly preferred and stable state compared to all other states. See BEG->TRUSTEE and TRUSTEE->EOD. 3. Reduction of Trust: People reduce their trust level to REMOVE compared to any other state. Compare REMOVE column with other columns.
Longitudinal Analysis of Trust Dynamics People switch to trust state much more quickly Most of the transition happens during first few days of trust establishment As the relationship stays in current state for longer period it is less likely to move out of that state
Evolving Trust Network
Reciprocation in Granting Trust Responses received No Response Second or more Interaction Trust Forward Link 16904/72445 = 23.3% 54273/72445 =74.9% 1268/72445 =1.75% Figure shows the distribution of response times for responses received trust A B response
Reciprocation in Revoking Trust Forward Links that Received a Response (8452) Backward Links that Responded (8452) Cancelled (1053) Never cancelled (7399) Responded for Cancel (207) Never cancelled (8245) Received cancel response (207) Received no cancel response (846) Received a cancel request did not respond (846) Never received a cancel request (7399) 1053 forward links cancelled out of 8452 (12.5%) and 207 of received a response (19.6%)
Revoking Trust - Response Time Distribution Most of the response is with in first few days Mean response time for trust response is 26.7 days which is much lower compared to 31.9 days for cancellation. People are more responsive to trust request than its cancel request. Similarly, 23.3% of trust requests are responded whereas only 19.6% of cancel requests are responded.
Trust and Socialization Trust is a hidden variable Measurable indirectly through observable proxies Social activities strongly correlated with trust Measurable Social Activities + Positive Feedback Loop Trust Not measurable 32
Socialization and Trust Granting 33
Is there a social hysteresis? Magnetic Hysteresis Polarity changes requires equal effort Ease of magnetization depends on the magnetic material Depends on the strength of magnetic field Social Hysteresis Trust is harder to build than distrust Ease of trust formation depends on the characters of the persons involved Depends on the type of social interaction 34
Robust Predictors of Trust Formation Problem Predictive models of trust Goal formation in social networks To find robust predictors of trust formation in different social networks in environments where more than one type of social relationship exists between two actors 35
Features Considered Node-based Topological Cross-network Average and Difference of Avatar Age Average and Difference of Character Level Human Gender and Country Indicator Average and Difference of Human Age Difference in degree centrality Sum and Difference of Node Degree Shortest distance Common neighbors Sum clustering index Salton Index Jaccard Index Sorensen Index Adar-adamic Index Resource Allocation Index Indicates the presence or absence of other social relationships during the training period within a prediction task 36
Robust Link Predictors Rank EQII Feature Generalized Description Feature Generalized Description CR3 1 Char Level Avg Agent s level of expertise 2 Char Level Difference Agent s level of expertise 3 Avatar Age Avg Agent s experience level Avatar Age Avg Sum Degree Shortest Distance Agent s experience level Propensity to connect Proximity in network 4 Shortest Distance Proximity in network Avatar Age Difference Agent s experience level 5 Sum Degree Propensity to connect Sum Clustering Index Completeness of the ego-network 6 Adar Adamic Index Based on shared neighbors Diff Degree Propensity to connect Key Results Shortest Distance is good predictor in all types of networks Propensity to connect/communicate (degree sum) is also a good predictor across all types of networks In activity-oriented networks, similarity in experience levels of two nodes is a good indicator of trust formation 37
Conclusions and Future Work Key conclusions Multiplayer games provide a great crucible for studying social dynamics in a highly nuanced manner, using graph analytics Longitudinal study of the trust relationship in EQ2 provides new insights into the social dynamics of how trust is formed, how it is revoked, impact of 2-person interaction on the community, etc. Multiple social relationships between the same group of individuals, e.g. trust, mentoring, trade, chat, etc., provide an opportunity to study how one type of relationship impacts another Structural link prediction algorithms can be made much more effective/accurate by bringing in social science knowledge Future work Study other aspects of the trust relationship Use this approach to study other relationships and interrelationship correlations
Concluding Remarks
Impact of radically new instrumentation 1950s Invention of the electron microscope fundamentally changed chemistry from playing with colored liquids in a lab to truly understanding what s going on 1970s Invention of gene sequencing fundamentally changed biology from a qualitative field to a quantitative field 1980s Deployment of the Hubble (and other) Space telescopes has had fundamental impact on astronomy and astrophysics 2000s Massive adoption of online social apps is fundamentally changing social science research Social systems, e.g. FB, Google+, LinkedIn, Twitter, online games, etc. are the new macroscopes of human behavior
The Virtual World Observatory http://129.105.161.80/wp/ Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers Noshir Contractor, Northwestern: Networks M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups Jaideep Srivastava, Minnesota: Computer Science Dmitri Williams, USC: Social Psychology Collaborators Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan (Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci, Michigan), Data, technology, funding partners Sony (EverQuest 2), Linden Labs (2 nd Life), Bungie (Halo3), Kingsoft (Chevalier s Romance), others Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, NSF, DARPA, CDC, ARL, ARI, IARPA,
Collaborators, Sponsors, Partners Team Financial Sponsors Data Partners Technology Partners
Thank you for your attention!