Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer
2
Evolving Technology Platforms Geographic Information Systems rely on the technology of the era Compass, telescope, sexton, paper maps Mainframe computers Workstations, GIS applications IT revolution, spatial databases GeoEnabled Infrastructure: LiDAR, Mobile, Stream Processing, Sensors, Cloud Computing
Disappearing line between Geospatial Technologies and Information Technologies SOA Mapping Digital data file Geographic Information Systems Spatial Information Technology
Latest Technology Trends Big Data Technology Hadoop, MapReduce, Hadoop File System (HDFS), Apache SPARK Cloud Computing
Big Data Technology Defined Big Data: Techniques and Technologies that Enable Enterprises to Effectively and Economically Analyze All of their Data
Big Data Definition The 3Vs Current Viewpoint: Big Data = Hadoop Volume (amount of data) Velocity (speed of data in and out) Variety (range of data types and sources) -2001 Meta Group (now Gartner) definition of Big Data 4 th V - Veracity (Uncertainty of Data) - 2012 IBM added a 4 th V Emerging Viewpoint: Big Data = Hadoop + Relational + NoSQL -2013 Facebook, 2014 Gartner Is Big Spatial Data different from Big Data? How does Big Spatial data fit into GIS?
Big Data Architecture Discover And Predict, Fast BIG DATA ANALYTICS BIG DATA APPLICATIONS Accelerate Data- Driven Action DATA CAPITAL Simplify Access To All Data BIG DATA MANAGEMENT BIG DATA INTEGRATION Connect And Govern Any Data Copyright Copyright 2014, Oracle 2014 Oracle and/or and/or its affiliates. its affiliates. All rights All rights reserved. reserved. Oracle Confidential 8
Key Factors Simplify access to all data Discover and predict, fast Govern and secure all data 9
Big Data + Advanced Analytics Profile Find Understand Transform Discover Predict Collaborate Easily add data and see it automatically and continuously cataloged, enriched and related Use familiar guided search across massive amounts of diverse data Know what s important from diagnostic analysis of millions of data characteristics Powerful tools to quickly clean up and wrangle dirty data so it s ready to go Uncover valuable new insights Use new insights to define and refine predictive models Publish, share and evolve as you learn more Oracle Confidential Internal 10
Cloud Computing Cloud computing enables customers to consume compute resources as a utility Just like electricity No need to build and maintain computing infrastructures in-house Involves large data centers by cloud providers Public Cloud and Private Cloud infrastructure as a service (IaaS): Amazon AWS storage platform as a service (PaaS): IBM, Oracle, MS Azure software as service (SaaS): AWS web services, Oracle, IBM Copyright Copyright 2014, Oracle 2014 Oracle and/or and/or its affiliates. its affiliates. All rights All rights reserved. reserved. Oracle Confidential 11
Elastically Scalable Consumers can scale up as needs increase and then scale down again as demands decrease Elastic is ideally dynamic and transparent, but can also be a specific action Most important is that it is possible Applies to storage, infrastructure, and software Elasticity also implies fault tolerance built into the system Seamlessly transfer the state of the application to a backup if the primary fails Virtualization Software is very important to achieve this goal Oracle Confidential Internal/Restricted/Highly Restricted
Self-Service Operations End users can spin up computing resources for almost any type of workload on-demand This applies to storage and compute resources All application and system related operations that a customer performs should be accessible via self-service by a customer without requiring any filing of service request to either support or cloud operations teams This involves managing space (eg. their block store or object store space) being able to access and analyze diagnostic logs being able to migrate data and metadata from one environment to another Oracle Confidential Internal/Restricted/Highly Restricted
Pay Per-Use Computing resources are measured at a granular level allows users to pay only for the resources and workloads they use This is one of the most important aspects for the growth of the cloud Consumers can now access a very large pool of computing resources when required without worrying about the cost or management of these hardware resources Oracle Confidential Internal/Restricted/Highly Restricted
Big Data vs Cloud Computing Is Big Data same as cloud computing? Not really, but they are tightly related Big Data by itself is not affordable for all consumers Large infrastructure cost to build cluster computing resources Human cost to find trained IT staff to manage them But Cloud service providers can afford to manage these large computing resources and make slices of it available to consumers Cloud computing has many technologies Hadoop, MapReduce, Relational DBs, middleware technology
Spatial Big Data and Cloud Computing challenges
Spatial Big Data Challenges Geo-tagging in the context of partial or indirect reference Minimize the time it takes to make the data available for analysis Discover Spatial and Temporal correlations between different data points Data loading time should be minimal to make the data available for use Load the data for immediate use, but create spatial indexes over time How to leverage the code from spatial database applications developed over the years Predictive Analytics for various applications
Location Infused Technology Java, Databases, Applications, Cloud
GeoSpatial Big Data Sources Traditional Data sources Raster (satellite imagery, elevation models, images) Vector (road networks, admin boundaries) Machine generated Internet of things Social media Sensors In vehicle navigation systems (trajectories, traffic information) Mobile phones
Extend Spatial Analytics with Cloud and Big Data
Predictive Analysis based on tweets How to infer potential trouble based on tweets? Data may have more than one spatial location Tweets are generated from a location, but the tweet might be referring to events at a different location Find the trend meet at NYC city center at 4PM protest against climate change Take action to deploy law enforcement to stop any potential crowd trouble Needs new algorithms to find spatial-temporal correlations and predict future events
Spatial Cloud Services System Developers (CS background) Focus on big data, database, SPARK, enabling data sets, etc. Application Developers (GIS background) Think about solving bigger problems More analysis frameworks and data sets are available now No barriers for entry Predictive analytics
Precision Farming Example Goal: Build Predictive Analytical Model to increase the crop yield Minimize water resources Minimize fertilizer Minimize the human capital cost Use all available sensor based data sources Satellite imagery, ground based sensors, etc.
How to build a precision farming application Acquire satellite data as required Acquire disk storage for storing the data Setup a cluster of machines to do the computations Find a scientist to build the models required to do the analytics using the raster data Expensive due to hardware, data acquisition and software costs
Need to develop new Spatial Algorithms? Map-Reduce uses data partitioning to achieve high performance Can we use divide and conquer algorithms without modifications? Depends on how the data is stored Need new algorithms for new use cases Data Scientist s focus should be on analysis of data Storage and data management should be done by the system This should be done via a model driven architecture
Spatial Cloud Services Development
Data storage and indexing Systems should support a few data storage and indexing models Vector data Raster data Sensor data Enable spatial and temporal search Applications can choose one of the provided storage models based in the data and query requirements Provide alternate ways to acquire data as required Web services, buy as needed, use from existing sources Provide reference data and models Free up the data scientist to do actual data analysis instead of data storage and layout models
Geo-Spatial Big Data Management Use once Data is loaded into the data store and analyzed once Extract summary or intelligence once and use it in other places Use many times Query the data to answer different types of questions Produce new data products
Data Analytics Challenge Separate silos of information to analyze Database 29
Data Analytics Challenge Separate data access interfaces Database 30
What does simplification mean for Spatial Big data analytics PhD Before Data Science Anyone After Web service APIs???
Spatial Cloud Services Application Development
Advanced Analytics Bring the Analytics to the Data Understand the data Decipher the data to uncover hidden patterns that can be used for better decisions Understand hidden correlations and use these relationships to solve business problems Predict future outcomes based on observed data before they happen Use predictive analytics, machine learning, and data mining techniques on big data
Two types of Analytical Approaches Reactive Collect large volumes of data from event logs, web logs, etc. Process, analyze and extract summaries from the data Feed the summary data into a traditional DW system Proactive Process the data as it comes in to find the correlations Find out if the patterns in the new data mean something Initiate actions based on perceived patterns
Precision Farming Example Multi-band raster data RGB Thermal Vegetation Analyze thermal band for vegetation properties Compute NDVI models Results can be used to model Water requirements for different parts of the farm Growth indicators Fertilizer schedules Identify under growth (caused by pests)
Application Development in Spatial Cloud Before Application Development After Spatial Cloud APIs Database Database
Breaking Barriers with Cloud Computing Cloud computing changing the way systems are built No more proprietary data silos Better result than what OGC/ISO standards have achieved in this respect No more closed systems Traditional software development paradigms are changing On premise cloud will replace most of the on-premise proprietary systems