Big Data, Social Networks, and Human Behavior Jukka-Pekka Onnela Harvard University Big Data for Development United Nations Headquarters, New York City July 10, 2012 1
Overview Progress in science has always been driven by data Explosion in the amount and type of data Big data refers to large and complex data sets Often multidimensional, longitudinal, digitally generated The big data phenomenon has its origin in Moore s law: The number of transistors on integrated circuits doubles every 18 months Sensors are cheaper, smaller, everywhere Enhanced computational capacity http://en.wikipedia.org/wiki/moore%27s_law 2
http://www.boston.com/bigpicture/2008/08/the_large_hadron_collider.html 3
Large Hadron Collider (LCH) Large Hadron Collider (LHC) at CERN is the biggest machine ever built Largest underground ring has a circumference of 27 kilometers (17 miles) 1232 dipole magnets, each 15 meters long weighing 30 tons Vacuum is 10 trillionth of an atmosphere Experiments generate 100MB of data (particle trajectories) each second Higgs boson http://www.runfam.com/2011/10/why-the-hare-may-never-beat-the-tortoise-zenos-paradox-the-paradox-of-motion/ 4
5
Networks and mobile sensing Mobile phones have been used in the past few years to study the structure of human social and communication networks Networks consist of nodes (actors) and ties (interactions) The field of research that studies networks, their structure and function, is called network science (in physics and mathematics) or social network analysis (in sociology and statistics) 6
Networks and mobile sensing Network theory, when applied to social networks, has a simple premise People are connected, therefore our health is connected People are connected, therefore our economic wellbeing is connected Mobile phones have enormous potential for the study of human social networks and human behavior in vivo, in a natural context outside laboratories Social behavior has remained essentially unchanged for millennia, but now, for the first time, we have the opportunity to study it at large scale 7
Networks and mobile sensing Besides communication, smartphones have sensors and computing capabilities These possibilities have led to a new research field called mobile phone sensing Mobile sensing has evolved in the past few years for several reasons (1) Availability of cheap embedded sensors Gyroscope, compass, accelerometer, proximity sensor, ambient light sensor, two cameras, microphone, GPS, WiFi, Bluetooth (2) Smartphones are programmable (3) Software can be easily distributed (4) Significant computational power (phone & cloud) Each phone can generate 1kB of data / second (conservative) 8
http://desktopwallpaperdownload.files.wordpress.com/2012/02/network-space-lights-planets-high-wallpapers-full-hd.jpg 9
Networks and mobile sensing Expect 6 billion phone subscriptions by the end of 2012 This results in 6 million MB / second, or 6 TB / second, of data This is 60,000 more data than CERN generates (conservative) Twofold opportunity: Use mobile phone sensing to learn about the individuals (nodes) Use mobile phone communication patterns and network theory to learn about the structural connections between individuals (ties) 10
Phone calls and texts in a European network 20% market share 18 weeks (126 days) Private subscriptions N = 7M; L = 23M Animation by Mikko Kivelä, Aalto University 11
Tie strengths in social networks The weak ties hypothesis Mark Granovetter, The strength of weak ties, American Journal of Sociology 78, 1360, 1973 12
Tie strengths in social networks Revisiting the hypothesis with aggregated cell phone data Tie strength Fraction of friends in common 7 min 5 min 15 min (3 calls) 3 min Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 13
Tie strengths in social networks Onnela, Saramäki, Hyvönen, Szabó, Lazer, Kaski, Kertész, Barabási Structure and tie strengths in mobile communication networks, PNAS 104, 7332, 2007 14
Tie strengths in social networks 15 15
Tie strengths in social networks 16 16
Tie strengths in social networks 17 17
Tie strengths in social networks 18 18
19
Pulse of the nation: Mood from Twitter Understanding the Demographics of Twitter Users; A Mislove, S Lehmann, YY Ahn, JP Onnela, JN Rosenquist; Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 2011 20
Thank you jponnela.com 22