Big Data, Fast Data, Spatial Data Making Sense of Location Data in a Smart City Hans Viehmann Product Manager EMEA ORACLE Corporation August 19, 2015 Copyright 2014, Oracle and/or its affiliates. All rights reserved.
Location Information in Smart Cities Everyone uses and shares Location Data Where is... How do I get to... Find me the nearest... When is the bus coming? I have checked in at... on Foursquare. Today I m at Africa Geospatial Forum 2015.259... N 53 35.469, E 10 01.261...... N 53 35.473, E 10 01.263...... N 53 35.477, E 10 01.265... N 53 35.481, E 10
Importance of Location Information Some examples based on the Oracle City Platform Improved Citizen Services Better Public Transport Social Media Interaction Improved Citizen Security Predictive Policing Social Media Analytics Improved City Operations Streamlined Process Management Optimized Field Service 4
Big Data Characteristics SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY VALUE 5
Different approaches to addressing Big Data challenges Dealing with Big Data, Fast Data, Spatial Data Optimize the use of existing software platforms Using more intelligent Algorithms, Parallelisation, Clustering,... Benefiting from existing knowledge and tools Upgrade the hardware environment Processors, Memory, Flash Cards, Infiniband,... May want to consider Engineered Systems Extend the architecture to include additional technologies Event-Processing Engines to deal with streaming data MapReduce technologies on low-cost commodity hardware platforms 6
Using Hadoop to Address Big Data Challenges The Apache Hadoopsoftware library is a frameworkthat allows for the distributed processingof large data sets across clusters of computers using simple programming models. Hadoopis designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver highavailability, the library itself is designed to detect and handle failures at the application layer, so delivering a highlyavailable serviceon top of a cluster of computers, each of which may be prone to failures. 7
Benefit of Big Data Technologies Low cost and high horizontal scalability infrastructure Allowing storage of more data, more details over longer time periods Cost-effective way to analyse huge amounts of data Dealing with variable data by means of schema-on-read capability Complementary to existing data warehouse technologies 8
Conventional database or Big Data technologies Typical technical decision criteria Straight Through Processing (STP) Tooling maturity 5 4 3 Stringent Non-Functionals Ingestion rate Cost effectively store low value data 2 ACID transactional requirement 1 0 Security Hadoop Relational ETL simplicity Data sparsity Variety of data formats
Data Architecture for Big Data Business Data Events and Streaming Analytic Tools Actionable Events Data Streams Social/Log Data Data Platform Actionable Insights Enterprise Data Other Data Sources Reservoir Factory Data Discovery Lab Warehouse Actionable Information Actionable Discoveries
But does this work for geospatial data? Location Intelligence has specific requirements Standard MapReduce works for geometric operations on single objects eg. determining the centroid Needs to deal with projections, complex operations such as buffering,... More complex processing usually requires spatial indexing eg. spatial joins Spatial data usually comes in specific formats (Shapefiles, GeoJSON,...) Needs to cope with location information which is only included implicitly requires geo-enrichment Visualization is very valuable for inspection of source data and results 11
Oracle Big Data Spatial and Graph Spatial Analysis Features for: Location Data Enrichment Proximity and containment analysis Vector and raster data preparation Map visualization Property Graph Features for: Flexible, schema-less data storage and maintenance Support of huge volumes of connected data In-memory graph analytics
Spatial Features Technical overview Support for spatial data in 2D or 3D in various formats, geodetic or projected Support for geo-referenced imagery such as satellite images in many formats MapReduce framework for resolution of placenames and determination location hierarchies, including GeoNames dataset as a reference Spatial indexing techniques for fast retrieval of spatial data Library of spatial operators for geometric analysis (inside, within distance, anyinteract,...) Library of image processing functions (mosaic, reprojection, format conversion, analysis,...) Console for visual analysis, indexing, processing Sample JEE application to be deployed in Jetty
Example aggregating tweets per state Results in Console Tweets in May by State
Examples: Raster data preparation Pyramiding: layers at different resolution Mosaic images Terrains and contours Shaded reliefs
The Big Picture Oracle Big Data Management System DATA RESERVOIR DATA WAREHOUSE Cloudera Hadoop Oracle Big Data SQL Oracle NoSQL Oracle R Distribution Oracle Big Data Spatial and Graph Oracle Event Processing Big Data Appliance Apache Flume Oracle GoldenGate Oracle Big Data Connectors Oracle Data Integrator Oracle Oracle Database Database In-Memory, Oracle Industry Multi-tenant Models Oracle Industry Models Oracle Advanced Analytics Oracle Advanced Oracle Spatial Analytics & Graph Oracle Spatial & Graph Oracle Data Integrator Exadata Oracle GoldenGate Oracle Event Processing SOURCES
Data Architecture for Big Data Business Data Events and Streaming Analytic Tools Actionable Events Data Streams Social/Log Data Data Platform Actionable Insights Enterprise Data Other Data Sources Reservoir Factory Data Discovery Lab Warehouse Actionable Information Actionable Discoveries
Geospatial data from positioning sensors Fast Data Streaming Data and Event Processing Increasing numbers of sensors deliver location data Streaming data: continuous, time ordered, does not end May be hard to process in realtime using relational technologies Typical use case for Event Processing / Event-Driven Architectures Focus on changes in data rather than the individual data point Filtering, Aggregation, Correlation, etc., as well as spatial analysis on streaming data Lightweight engines supporting distributed pre-processing Reducing network load by moving pre-processing close to the source, eg. RFID Scanner 18
Oracle Stream Explorer High-level Architecture CEP Engine Sensors Input Adapter event event event Query event Query event event Output Adapter Backend Applications Real-time event data Context-aware filtering, correlation, aggregation and processing of data Processed business events for downstream applications 19
Event Processing Modelling of Event Processing Networks SELECT vehicleid FROM in-channel [now] gps, ContextualServiceData route, WHERE inside@spatial(gps.location, route.geometry) = false 20
Summary New technologies have evolved over the last years to address Big Data challenges for Smart Cities Dealing with larger volumes of unstructured or semi-structured data Managing streaming data Using semantic technologies to address interoperability Oracle provides spatial data management capabilities and location analytics On data warehouses as well as on Big Data platforms such as hadoop Including 2D, 3D, vector, raster or point cloud support as well as visualization Including Geo-enrichment to Big Data environments and semantic technologies Including spatial analysis on streaming data 21
Join our local community in South Africa South African Oracle Users Group (SAOUG) Special Interest Group for Spatial and Graph SAOUG Connect Conference Annual User Conference, Sep. 20-22 Cape Town, Cape Sun Hotel Spatial SIG Meeting Aug. 20 9:00h 12:00h Oracle Office, Woodmead 22
Further information Overview on oracle.com http://www.oracle.com/database/big-data-spatial-and-graph Oracle Technology Network http://www.oracle.com/technetwork/database/database-technologies/bigdata- spatialandgraph Big Data Spatial and Graph blog http://blogs.oracle.com/bigdataspatialgraph Oracle Spatial and Graph (!) group on LinkedIn... or try it out using the latest Big Data Lite VM (v4.2) Available for download here 23
24