1 Spatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University of Minnesota AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012) More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM SIGMOD Workshop on Data Engineering for Wireless and Mobile Access, 2012.
2 Research Theme 1: Spatial Databases Evacutation Route Planning Parallelize Range Queries Shortest Paths Storing graphs in disk blocks only in old plan Only in new plan In both plans
3 Theme 2 : Spatial Data Mining Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35 Nest locations Distance to open water Vegetation durability Water depth Co-location Patterns Tele connections
4 Outline Motivation What is Spatial Big Data (SBD)? SBD and Science SBD Analytics Conclusions 4
5 Big Data Mining and analyzing these big new data sets can open the door to a new wave of innovation, accelerating productivity and economic growth. Some economists, academics and business executives see an opportunity to move beyond the payoff of the first stage of the Internet, which combined computing and low-cost communications to automate all kinds of commercial transactions. Estimated Value >Usd 1 Trillion per year by 2020 Location-based service: usd 600 B Health Informatics: usd 300 B Manufacturing:
6 Spatial Big Data Definitions Spatial datasets exceeding capacity of current computing systems To manage, process, or analyze the data with reasonable effort Due to Volume, Velocity, Variety, SBD Components Data-intensive Computing: Cloud Computing Middleware, e.g., Map-Reduce, Pregel, Big-Table, Big-Data analytics, e.g., data mining, machine learning, computational statistics, Big Data science and societal applications Ex. Social media datasets, e.g., Google Flu Trend Which patterns may be detected in these datasets? Flu outbreaks? 6
7 Traditional Spatial Data Spatial attribute: Neighborhood and extent Geo-Reference: longitude, latitude, elevation Spatial data genre Raster: geo-images e.g., Google Earth Vector: point, line, polygons Graph, e.g., roadmap: node, edge, path Raster Data for UMN Campus Courtesy: UMN Graph Data for UMN Campus Courtesy: Bing Vector Data for UMN Campus Courtesy: MapQuest 7
8 Raster SBD Data Sets >> Google Earth Geo-videos from UAVs, security cameras Satellite Imagery (periodic scan), LiDAR, Geo-sensor networks Climate simulation, EPA Air Quality Example use cases Patterns of Life Change detection, Feature extraction, Urban terrain LiDAR & Urban Terrain Average Monthly Temperature Feature Extraction Change Detection (Courtsey: Prof. V. Kumar) 8
9 Use Case: Patterns of Life, e.g., activity space Weekday GPS track for 3 months Patterns of life Activity Space: Usual places and visits Rare places, Rare visits Work Farm Morning 7am 12am Afternoon 12noon 5pm Evening 5pm 12pm Midnight 12midnight 7pm Total Home Work Club Farm 1 1 Total Home Club 9
10 Vector SBD from Geo-Social Media Vector data sub-genre Point: location of a tweet, Ushahidi report, checkin, Line-strings, Polygons: roads in openstreetmap Use cases: Persistent Surveillance Outbreaks of disease, Disaster, Unrest, Crime, Hot-spots, emerging hot-spots Spatial Correlations: co-location, teleconnection 10
11 Persistent Surveillance at American Red Cross Even before cable news outlets began reporting the tornadoes that ripped through Texas on Tuesday, a map of the state began blinking red on a screen in the Red Cross' new social media monitoring center, alerting weather watchers that something was happening in the hard-hit area. (AP, April 16 th, 2012) 11
12 Graphs SBDs: Temporally Detailed Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, Temporally detailed roadmaps [Navteq] Use cases: Accessibility by time of week, Best start time, Best route at different start-times 12
13 Outline Motivation What is Spatial Big Data (SBD)? SBD and Science SBD Analytics Conclusions 13
14 Big Data and Science Science in the Petabyte Era Increasing Volume Heightened Complexity Demands for Interoperability Nature, 7209(4), September 4, 2008 "Above all, data on today's scale require scientific and computational intelligence. Google may now have its critics, but no one can deny its impact, which ultimately stems from the cleverness of its informatics. The future of science depends in part on such cleverness again being applied to data for their own sake, complementing scientific hypotheses as a basis for exploring today's information cornucopia."
15 Preparing Science for Big-Data Nature, 7209(4), September 4, 2008 Big Data Translates into Big Opportunities... and Big Responsibilities Sudden influxes of data have transformed researchers' understanding of nature before even back in the days when 'computer' was still a job description. Unfortunately, the institutions and culture of science remain rooted in that pre-electronic era. Taking full advantage of electronic data will require a great deal of additional infrastructure, both technical and cultural
16 Models in Science Science: understand natural world Subjective Objective, (transparent, reproducible) Methods: Forward models, Backward models Engineering: Solve problems optimizing cost, efficiency, etc. Models Forward Backward Manual (Paper, Pencil, Slide-rules, log-tables, ) Differential Equations (D.E.), Algebraic equations, Parametric models, e.g. Regression, Correlations, sampling, Experiment design, Hypothesis testing, Assisted by computers (HPCC, cyberinfrastructure, data-intensive, bigdata) Computational Simulations using D.E.s, Agent-based models, etc. Bayesian: resampling, local regression, MCMC, kernel density estimation, neural networks, generalized additive models, Frequentist: frequent patterns, Model ensembles, hypothesis generation, Exploratory Data Analysis: data visualization, visual analytics, geographic information science, spatial data mining,
17 Outline Motivation What is Spatial Big Data (SBD)? SBD and Science SBD Analytics SBD Infrastructure Conclusions 17
18 Pre-Electronic Era Models: Example Cholera in London Broad St. water pump except a brewery Recent Decades Proximity vs. Accessibility
19 Complication Dimensions Spatial Networks Time From Hotspots To Mean Streets Challenges: Trade-off b/w Semantic richness and Scalable algorithms 19
21 Pre-Electronic Models: Example 2 Location Prediction Models to predict location, time, path, Nest sites, minerals, earthquakes, tornadoes, Pre-electronic models, e.g. Regression Assumed i.i.d To simplify parameter estimation Least squares easy to hand-compute Alternatives Spatial Autoregression, Geographic Weighted (Local) Regression Parameter estimation is compute-intensive! ln( L) ln I W Next Non-i.i.d errors: Distance based Spatio-temporal vector fields (e.g. flows, motion) y ρwy xβ ε n ln(2 ) 2 n ln( 2 2 ) SSE
22 Example 3: Global vs. Local Regression Example: Lilac Phenology data Yearly date of first leaf and first bloom 1126 locations in US & Canada Global regression model shows a mystery Postive Slope => blooms delayed in recent years! Spatial decomposition solves the mystery East of Mississippi, West of Mississippi Each half has Negative Slope => blooms earlier in recent years! However slopes are different across east & west More reports in west in recent years River Station
23 Outline Motivation What is Spatial Big Data (SBD)? SBD and Science SBD Analytics Conclusions 23
24 Spatial Big Data (SBD) Summary SBD are becoming available Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, Big Opportunities Data: Quicker detection of disease outbreaks, e.g., Google Flu Trends Multi-decade large-area studies, e.g., Gulf Study, Exposomics, Intervention: How can geo-social network induce desired behavior? Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, Large scale Collaboration on Complex Questions Studies with thousands of doctors and hundred million humans... and Big Responsibilities Institutions and culture of science remain rooted in that pre-electronic era. Ex. Hotspots to Mean Streets Big data exceeding capacity of traditional systems 24
International Journal of Information Systems for Crisis Response and Management, 2(4), 49-59, October-December 2010 49 The Role of Social Networks in Emergency Management: A Research Agenda Linna Li, University
Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society Randal E. Bryant Carnegie Mellon University Randy H. Katz University of California, Berkeley Version 8: December
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
B.2 Executive Summary As demonstrated in Section A, Compute Canada (CC) supports a vibrant community of researchers spanning all disciplines and regions in Canada. Providing access to world- class infrastructure
Big data and open data as sustainability tools A working paper prepared by the Economic Commission for Latin America and the Caribbean Supported by the Project Document Big data and open data as sustainability
STATE OF IOWA FEBRUARY 4-5, 2015 REQUEST FOR NEW PROGRAM AT IOWA STATE UNIVERSITY: MASTER OF BUSINESS ANALYTICS PROGRAM Contact: Diana Gonzalez Action Requested: Consider approval of the request by Iowa
Foresight, Competitive Intelligence and Business Analytics Tools for Making Industrial Programmes More Efficient Jonathan Calof, Gregory Richards, Jack Smith 9 φ β X Creating industrial policy and programmes,
New Data for Understanding the Human Condition: International Perspectives OECD Global Science Forum Report on Data and Research Infrastructure for the Social Sciences Data-driven and evidence-based research
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
2014 www.tmforum.org GEO- $245 USD / free to TM Forum members ANALYTICS QUICK INSIGHTS ADDING VALUE TO BIG DATA Sponsored by: Report prepared for Kathleen Mitchell of TM Forum. No unauthorised sharing.
IBM Global Business Services IBM Institute for Business Value Business Analytics and Optimization for the Intelligent Enterprise Business Analytics and Optimization IBM Institute for Business Value IBM
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania email@example.com, firstname.lastname@example.org
PROJECT FINAL REPORT Grant Agreement number: 212117 Project acronym: FUTUREFARM Project title: FUTUREFARM-Integration of Farm Management Information Systems to support real-time management decisions and
Business innovation and IT trends If you just follow, you will never lead Contents Executive summary 4 Background: Innovation and the CIO agenda 5 Cohesion and connection between technology trends 6 About
Solving a Big-Data Problem with GPU: The Network Traffic Analysis Mercedes Barrionuevo, Mariela Lopresti, Natalia Miranda, Fabiana Piccoli UNSL - Universidad Nacional de San Luis, Departamento de Informática,
Future Internet Bandwidth Trends: An Investigation on Current and Future Disruptive Technologies Yanyan Zhuang Theodore S. Rappaport Justin Cappos Rick McGeer Department of Computer Science and Engineering
IT@Intel Achieving Intel Transformation through IT Innovation 2014 2015 Intel IT Business Review Annual Edition The Transformative Power of Innovation Kim Stevenson Intel Chief Information Officer Contents
At the Big Data Crossroads: turning towards a smarter travel experience Thomas H. Davenport Visiting Professor at Harvard Business School Amadeus IT Group is committed to minimizing its carbon footprint.
BIG DATA IN LOGISTICS A DHL perspective on how to move beyond the hype December 2013 Powered by Solutions & Innovation: Trend Research PUBLISHER DHL Customer Solutions & Innovation Represented by Martin
7 th Framework Programme Theme 7: TRANSPORT (including AERONAUTICS) CONDUITS, Coordination Of Network Descriptors for Urban Intelligent Transport Systems Contract n 218636 I. Kaparias, M.G.H. Bell email@example.com
Cover Page DEMYSTIFYING BIG DATA A Practical Guide To Transforming The Business of Government Prepared by TechAmerica Foundation s Federal Big Data Commission 1 TechAmerica Foundation: Federal Big Data
MANAGEMENT INFORMATION SYSTEMS Stephen B. Harsh Department of Agricultural Economics Michigan State University firstname.lastname@example.org INTRODUCTION Management information systems encompass a broad and complex topic.
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
CITIZENS AS SENSORS: THE WORLD OF VOLUNTEERED GEOGRAPHY Michael F. Goodchild 1 ABSTRACT In recent months there has been an explosion of interest in using the Web to create, assemble, and disseminate geographic
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
JANUARY 2013 REPORT OF THE DEFENSE SCIENCE BOARD TASK FORCE ON Cyber Security and Reliability in a Digital Cloud JANUARY 2013 Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics