1 Tracking Disease Outbreaks and Health Disparities using Big Data, Social Media, GIS, and Social Network Analysis Dr. Ming-Hsiang Tsou Director of the Center for Human Dynamics in the Mobile Age Professor, Department of Geography San Diego State University February 20, 2015
2 What is Big Data? Image source: (definition from IBM, and WIPRO) Is this a good definition of Big Data?
3 Big Data is Human-Centered Data Big Data is dynamic datasets created by or derived from human activities, communications, movements, and behaviors (Tsou, 2015, in review). The term, big data, should refer to big ideas, big impacts, and big changes for our society rather than only focusing on big volume.
4 Big Data Category (Tsou, 2015). Social life data include popular social media services (Twitter, Flickr, Snapchat, YouTube, Foursquare, etc.), online forums, online video games, and web blogs. Health data include electronic medical records (EMR) from hospitals and health centers, cancer registry data from state and local communities, official disease outbreak tracking and epidemiology data Business and commercial data include credit card transactions, online business reviews (such as Yelp and Amazon reviews), supermarket membership records, shopping mall transaction records per store, credit card fraud examination data, enterprise management data, and marketing analysis data. Transportation and traffic data include GPS tracks (from taxi, buses, Uber, bike sharing programs, and mobile phones), traffic censor data (from subways, trolleys, buses, bike lanes, highways), social media data (from check-ins, Waze, and other social media platforms), and mobile phone data (from data transmission records and cellular network data). Scientific research data include earthquakes sensors, weather sensors, satellite images, crowd sourcing data for biodiversity research, volunteered geographic information, and census data.
5 Data Integration / Data Fusion Explore their spatiotemporal relationships in both network space and geographical space. Health Data Layer Image provided by Dr. Atshushi Nara (Associate Director of HDMA Center).
6 The Challenge of Big Data Analytics: Big Data are very Messy, Noisy, and Unstructured! Image Source: Require collaboration efforts from linguistics, computer scientists, data mining experts, statisticians, physicists, modelers, and domain experts. Human Dynamic in the Mobile Age (HDMA)
7 Geography (place and time) is the KEY for Understanding and Integrating Big Data (Tsou and Lietner, 2013) Big Data (information) Time Place
8 Research Showcase #1: Geo-Targeted Social Media Analytics for Tracking Flu Outbreaks in U.S.
10 How to Collect Social Media Data? Source: Twitter Search APIs to collect public tweets (Geo-targeted) Keywords: flu and influenza. Region: 17 miles radius from the center of 31 U.S. Cities. Time: September 30, 2013 (Week 40) March 23, 2014 (Week 12) Twitter Search API: based on the user profile (locations) and gazetteers (San Diego: include La Jolla, La Mesa, Chula Vista) (FREE for Search APIs) Human Dynamic in the Mobile Age (HDMA)
11 Collect Tweets from Top 31 U.S. Cities (17 miles radius) 31 different cities across the United States (chosen based on their population sizes): Atlanta, Austin, Baltimore, Boston, Chicago, Cleveland, Columbus, Dallas, Denver, Detroit, El Paso, Fort Worth, Houston, Indianapolis, Jacksonville, Los Angeles, Memphis, Milwaukee, Nashville-Davidson, New Orleans, New York, Oklahoma City, Philadelphia, Phoenix, Portland, San Antonio, San Diego, San Francisco, San Jose, Seattle, and Washington, D.C. Human Dynamic in the Mobile Age (HDMA)
12 Normalizing City Tweeting Rate by Census Data City Population = Combine all census track population numbers within the circle. (Different from the official record of City Population) Human Dynamic in the Mobile Age (HDMA)
13 Filter and Refine Big Data (Remove Noises) Number of tweets 10,678 5,398 4,947 4, Machine Learning Total Flu tweets collected: 307,070. Final valid flu tweets: 88,979.
14 Real-Time Monitoring of Flu Outbreaks in U.S. (National Scale combined 31 Cities), flu season (R) value = National ILI (Influenza-like illness) Two weeks delay National Tweet rate (31 Cities average) Real time update 10 days earlier than ILI
15 Multi-Level (Scale) Analysis: (Regional and Municipal Scales) Regional ILI Region 4 Region 5 Human Dynamic in the Mobile Age (HDMA)
16 ER ILI data is better than SP ILI Weekly Valid Flu Tweeting Rates Compare with Emergency Department ILI in Different Cities Tweet rate per 100,000 Sentinel provided ILI rate Averaged ILI rate (missing data) * P < 0.05; ** P < 0.005; *** P < Correlation between City-level Sentinel Provided ILI Rates vs. Tweeting rate is VERY BAD! (SP ILI is not reliable?) 16
17 Trend Analysis at the Municipal Scale (San Diego) with the Lab-tested confirmed flu cases San Diego: Lab confirmed Flu Cases vs Tweeting Rate: (R) value =
18 Two research papers in the Journal of Medical Internet Research Human Dynamic in the Mobile Age (HDMA)
19 Next Step: Monitor Flu Outbreaks in Real-Time? The HDMA Center is developing the SMART Dashboard: SMART Dashboard: Social Media Analytic and Research Testbed (Beta)
20 SMART Dashboard Social Media Analytic and Research Testbed Live Demo For 5 mins Real-time social media analytics (Trend Analysis, Word Clouds, Top URL, web pages, Top Hashtages/Mentions/Stories).
21 New Topic: Tracking Ebola
22 Tracking Flu Outbreaks in 2014/2015 Flu Season (Not as smooth as 2013 or 2012)
23 Only GPS-tagged Tweets Collected Now # of Filtered ILI Tweets, Top 30 US Cities, as of February 9, 2015 (from SMART dashboard) Only 1% -4% tweets has GPS coordinates. Problems!!! Twitter broke its Search APIs on 11/20/2014 and only returned GPS-tagged tweets only. (Reduce 90% -95% of tweets collected) CDC Influenza Positive Tests, National Data Summary, through Weeks 40-3, Season
24 Comparison between ILI and GPS-only Tweets among 30 U.S. Cities Red: National ILI, Purple: GPS Only Tweets Tweeting Rate, multiplied by 10 for flu season Human Dynamic in the Mobile Age (HDMA)
25 The Limitation of Social Media For Flu Outbreak Monitoring
26 Social Media User Profiles Social Media messages can NOT represent all population, but it can provide warning signals and real-time updates. Some flu virus may have bigger impacts to Elder People than Younger People Survey
27 Only 1% - 4% of GPS Tweets within the Free Garden Hose Tweets (1%) Only 1% -4% tweets has GPS coordinates. 80% tweets has city-level location info (user profile). Public (Free) Twitter APIs can only collect 1% of total tweets within a region. (Garden Hose). But our unique Search API methods can collect up to 100% of requested tweets using Keywords and region combination. Twitter Firehose (Massive, real-time stream of tweets): Not-free, expensive. Historical Firehose Tweets can be purchased by Tweet Data-reseller partners: GNIP, SocialFlow, MediaSift, etc. (expensive).
28 User Privacy Issue Concerns about Big Brother. Although all the tweets collected from APIs are public tweets (everyone can search them and retrieve them). Some content of tweets may contain personal private information (real names, locations of homes, offices, private conversations, medical situations, etc.)
29 Social Network Analysis (SNA) The top 30 U.S. city Whooping cough Twitter message networks (mentions networks) : Social Network Nodes User Profiles Data collection temporal coverage July 22 to December 9, 2014 SharylAttikson: Investigative Journalist. Author of Stonewalled. Dreaming of a day when public officials answer questions as if they know they work for the public. NFIDvaccines: National foundation for infectious diseases. Non-profit organization dedicated to educating the public and healthcare professionals about the causes, treatment, and prevention of infectious diseases. Roseperson: Regional Support manager, aligning force for quality at the AF4Q National Program Office (at George Washington University) Jonrappoport: Free-lance investigative reporter for over 30 years. Acognews: the American Congress of Obstetricians and Gynecologists. The leading authority on women s health care. CDCgov: CDC's official Twitter source for daily credible health & safety updates from the Centers for Disease Control & Prevention. Laurie_Garrett: is a Pulitzer prize-winning science journalist and writer of two bestselling book: I Heard the Sirens Scream (2011), The Coming Plague & Betrayal of Trust CFR_org: The Council on Foreign Relations is a resource for foreign policy news and analysis.
30 Whooping cough message networks in Twitter (among top 30 U.S. cities) Top 30 US Cities Clustering Coefficient (sub-groups, communities)
31 Top 30 US Cities Whooping Cough Grouping Major Nodes (information flow) LJbe il acognews Whooping cough message flows in Twitter (among top 30 U.S. cities) CFR_org CDCgo v Laurie_garr ett SharylAttkisson roseperson NFIDvaccines Jonrappoport
32 Opinion Leaders vs. Broadcasters Top 30 US Cities In Degrees (Tweets were retweeted by others) Top 30 US Cities Out Degrees (Retweeting other s messages)
33 Next Step: Location-based Social Network Analysis
35 Big Data Research Agenda: Health Disparities and GIS (Geographic Information Systems)
36 Two Important Public Health Challenges Cancer Obesity Human Dynamic in the Mobile Age (HDMA)
37 Health Disparities: Cancer Mortality Rates Source: Human Dynamic in the Mobile Age (HDMA)
38 Multi-Scales Multi-layers GIS analysis Spatial Scale States Counties Zipcodes Census Tracts Street blocks Individual points Correlation with Census Data, weather and climate data, land use data, air pollution, and others? Human Dynamic in the Mobile Age (HDMA)
39 Geostatistics and GIS Modelling Explore statistical relationships in data Build geostatistical surfaces Detect clusters Spatiotemporal Analysis and Visualization Statistical Alarm Bell Display outlier or influential cases by location Statistical analysis also useful in finding zones of significantly higher disease prevalence
40 Obesity Tracking Human Behaviors and Activities using Mobile Devices (wearable computing) Human Dynamic in the Mobile Age (HDMA)
41 GPS Data Analytics. (Criminal GPS monitoring) Image provided: Professor May Yuan (U of Texas at Dallas) and Assistant Professor Atsushi Nara (SDSU) Sex Offenders Monitored by GPS : At rest: every hour In Motion: every minute Can we apply similar algorithms to analyze obesity problems?
42 Human Dynamics in the Mobile Age (HDMA) Center of Research Excellence at San Diego State University Established in May 2014 Human Dynamics Spatial Science Mobile Technology
43 Thank You Q & A Director: Dr. Ming-Hsiang Tsou Funded by NSF Cyber-Enabled Discovery and Innovation (CDI) program. Award # ( ) NSF Interdisciplinary Behavioral and Social Science (IBSS) Program, Award # ( ): Spatiotemporal Modeling of Human Dynamics Across Social Media and Social Networks. Human Dynamic in the Mobile Age (HDMA)
International Journal of Information Systems for Crisis Response and Management, 2(4), 49-59, October-December 2010 49 The Role of Social Networks in Emergency Management: A Research Agenda Linna Li, University
CHAPTER 1.8 The Wisdom of the Cloud: Hyperconnectivity, Big Data, and Real-Time Analytics MIKAEL HAGSTRÖM NEENA GILL SAS In a hyperconnected world, transactions and communication do not happen in a vacuum.
Big Data for Development: Challenges & Opportunities May 2012 Acknowledgements This paper was developed by UN Global Pulse, an initiative based in the Executive Office of the Secretary-General United Nations.
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society Randal E. Bryant Carnegie Mellon University Randy H. Katz University of California, Berkeley Version 8: December
DRAFT VERSION Big Data privacy principles under pressure September 2013 2 Contents Summary... 6 1 Introduction... 8 1.1 Problems for discussion... 8 1.2 Definitions... 9 1.2.1 Big Data... 9 1.2.2 Personal
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
BIG DATA IN ACTION FOR DEVELOPMENT This volume is the result of a collaboration of World Bank staff (Andrea Coppola and Oscar Calvo- Gonzalez) and SecondMuse associates (Elizabeth Sabet, Natalia Arjomand,
Big Data Analytics Harvard-Smithsonian Center for Astrophysics Data Science Training for Librarians April 4, 2013 David Dietrich, EMC Education Services I ll go into a company and say, What data problems
The linking and integration of large data sets offers a new dimension to the development, implementation, and evaluation of policy and program initiatives. Yet the ability to accomplish this often depends
Examining How Social and Emerging Media Have Been Used in Public Relations Between 2006 and 2012: A Longitudinal Analysis Donald K. Wright, Ph.D. Harold Burson Professor and Chair in Pubic Relations College
Big data and open data as sustainability tools A working paper prepared by the Economic Commission for Latin America and the Caribbean Supported by the Project Document Big data and open data as sustainability
Big data and positive social change in the developing world: A white paper for practitioners and researchers Rockefeller Foundation Bellagio Centre conference, May 2014 Please cite as: Bellagio Big Data
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
New Data for Understanding the Human Condition: International Perspectives OECD Global Science Forum Report on Data and Research Infrastructure for the Social Sciences Data-driven and evidence-based research
2014 www.tmforum.org GEO- $245 USD / free to TM Forum members ANALYTICS QUICK INSIGHTS ADDING VALUE TO BIG DATA Sponsored by: Report prepared for Kathleen Mitchell of TM Forum. No unauthorised sharing.
AAPOR Report on Big Data AAPOR Big Data Task Force February 12, 2015 Prepared for AAPOR Council by the Task Force, with Task Force members including: Lilli Japec, Co-Chair, Statistics Sweden Frauke Kreuter,
THE LYTX ADVANTAGE: USING PREDICTIVE ANALYTICS TO DELIVER DRIVER SAFETY BRYON COOK VP, ANALYTICS www.lytx.com 1 CONTENTS Introduction... 3 Predictive Analytics Overview... 3 Drivers in Use of Predictive
white paper Boosting Retail Revenue and Efficiency with Big Data Analytics A Simplified, Automated Approach to Big Data Applications: StackIQ Enterprise Data Management and Monitoring Abstract Contents
NETWORK MONITORING AND DATA ANALYSIS IN WIRELESS NETWORKS By Yongjie Cai A Dissertation Proposal Submitted to the Graduate Faculty in Computer Science in Partial Fulfillment of the Requirements for the
Association for Data-driven Marketing & Advertising BEST PRACTICE GUIDELINE: BIG DATA A guide to maximising customer engagement opportunities through the development of responsible Big Data strategies.
KPMG INTERNATIONAL. Issues Monitor. Increasing Importance of Social Media in Healthcare. October 2011, Volume Eight. kpmg.com Mark Britnell Chairman, KPMG's Global Healthcare Practice Keeping up to date
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
Consumers, Big Data, and Online Tracking in the Retail Industry A CASE STUDY OF WALMART NOVEMBER 2013 ABOUT US Center for Media Justice CenterforMediaJustice.org Founded in 2008, the Center for Media Justice
B ig Data and Analytics in Northern Virginia and the Potomac Region May 2014 Sponsored by Northern Virginia Technology Council 2214 Rock Hill Road Herndon, Virginia 20170 703.904.7878 (phone) 703.904.8009