Big Data for Official Statistics Processing Big and Fast Data Optimizing Results with a Multi-Model Database

Similar documents
Geospatial Technology Innovations and Convergence

Geospatial Platforms For Enabling Workflows

Geospatial Platforms For Enabling Workflows

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

Advancing Sustainability with Geospatial Steven Hagan, Vice President, Server Technologies João Paiva, Ph.D. Spatial Information and Science

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Deploying a Geospatial Cloud

Big Data, Platforms, Tools, and Advanced Analytics Applications and Capabilities What do you Need

The benefits and implications of the Cloud and Software as a Service (SaaS) for the Location Services Market. John Caulfield Solutions Director

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Graph Database Performance: An Oracle Perspective

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Where is... How do I get to...

Advanced Fraud Detection & Prevention Through Big Data

So What s the Big Deal?

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Digital Transformation

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Real Time Big Data Processing

The Future of Data Management

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Safe Harbor Statement

Oracle Cloud Computing Strategy

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

BIG Big Data Public Private Forum

Big Data Analytics Nokia

Oracle: Database and Data Management Innovations with CERN Public Day

Mining Big Data with RDF Graph Technology:

Luncheon Webinar Series May 13, 2013

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Chapter 7. Using Hadoop Cluster and MapReduce

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

ORACLE DATABASE 10G ENTERPRISE EDITION

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Cloud for e-government

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Oracle Database 12c Plug In. Switch On. Get SMART.

Databases for 3D Data Management: From Point Cloud to City Model

Big Data on Microsoft Platform

Architecting for the Internet of Things & Big Data

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

How To Use Hp Vertica Ondemand

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Transforming the Telecoms Business using Big Data and Analytics

Survival Tips for Big Data Impact on Performance Share Pittsburgh Session 15404

Are You Ready for Big Data?

Cloud Computing and Advanced Relationship Analytics

Oracle Cloud Update November 2, Eric Frank Oracle Sales Consultant. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

National Security and Cyber Defense with Big Data

Oracle Big Data Strategy Simplified Infrastrcuture

IT Platforms for Utilization of Big Data

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

IBM Software Hadoop in the cloud

How To Create A Data Science System

Big Data and Analytics in Government

Please give me your feedback

Big Data and the Cloud Trends, Applications, and Training

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

High Performance Data Management Use of Standards in Commercial Product Development

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Handle Big Data With A Data Scientist

An Oracle White Paper February Managing Unstructured Data with Oracle Database 11g

Oracle Spatial and Graph

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Survey of Big Data Architecture and Framework from the Industry

In-memory computing with SAP HANA

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Oracle Public Cloud. Peter Schmidt Principal Sales Consultant Oracle Deutschland BV & CO KG

XpoLog Competitive Comparison Sheet

Oracle Big Data Spatial and Graph

Reference Architecture, Requirements, Gaps, Roles

Tax Fraud in Increasing

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Next Wave of Data Management. Is Big Data The New Normal?

Introducing Oracle Exalytics In-Memory Machine

Big Data at Cloud Scale

Oracle Cloud Strategy. Sudip Datta Vice President of Product Management

Big Data and Big Data Modeling

Are You Ready for Big Data?

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Transcription:

Big Data for Official Statistics Processing Big and Fast Data Optimizing Results with a Multi-Model Database Steven Hagan Vice President Oracle Database Server Technologies October, 2015

Global Digital Data Growth: Exceeds Storage Mfg Growing leaps and bounds by 40+% YoY! 2009 =.8 Zetabytes =.08 ZB Structured Data =.72 ZB Unstructured Data 2020 = 35 Zetabytes LEGEND Structured Data Unstructured Data = 3.5 ZB Structured Data = 31.5 ZB Unstructured Data* (1 Zetabyte = 1 Trillion Gigabytes) Chart conservatively assumes a constant 9:1 ratio of unstructured data vs. structured data (based upon IDC s estimate that 90% of all digital data is unstructured). Chart does not reflect IDC s projection that unstructured data is currently growing twice as fast as structured data at the rate of 63.7% vs. 32.3% CAGR. Source: IDC Digital Universe Study, A Digital Universe Decade Are Your Ready?, 2010

Human Race is Generating Data Vastly faster than Making Computer Storage DATA YOTTABYTES 20 Z ZETTABYTES EXABYTES 300 E STORAGE PETABYTES Terabytes YEARS 2020 -> Oracle Confidential Internal/Restricted/Highly Restricted 3

Data Volume & Variety Generation Explosion Continues Terabytes, Petabytes, Exabytes, Zettabytes VIDEO: UAVs, DRONES, SURVEILLANCE IMAGERY/Raster: (Satellites, Planes) Sensors (IOT), LIDAR, 3D, RFID Social Media, Web Scraping, Mobile Phones New data products for: Land and Water mgmt, Agriculture, Environment Transportation, Terrain and City Models, SDIs for planning, maintenance, Emergency response, Defense, Intelligence, Consumers Location is a Powerful Organizing Principle Semantics, Ontologies -- Wearable Technologies Genomics (DNA Sequencing), Astronomy MULTIPLE VERSIONS OF THE ABOVE

Data Velocity: Real-Time Spatially-aware Streams / Events / Sensors / Internet of Things (EVERYTHING) Track / Monitor Moving Objects UAVs, Drones, cars Real-Time Business Intelligence Ultra-high throughput (1 million/sec++) and microsecond latency Sensors on Aircraft Turbine Blades Filtering, correlation, and aggregation across event sources Real-Time Pattern Detection Detect patterns in the flow of events and message payloads, Complex Event Processing (CEP) - Business Intelligence in Real Time - Mobile Phones - Self-Driving Cars

Processing Big & Fast Data: Video, Imagery, Sensors, Social, Mobile, Filter, Move, Transform, Analyze, Act - at High Velocity FILTER CORRELATE AGGREGATE Oracle Event Processing Oracle Spatial ENRICH & TRANSFORM Oracle Coherence Oracle GoldenGate Oracle Data Integrator ANALYZE Oracle BAM Oracle Mapviewer Oracle Business Intelligence Oracle Information Discovery ACT STORE / SAVE / ARCHIVE?? THE RESULTS

TRENDS: Next 5 years or so Computer System Performance Hardware - Evolutionary Moore s law still holding New possibilities at Research Level not yet proven DNA for Storage; 3D Glass, Holography; Carbon Nanotubes, Graphene Software Disruptive Parallelism enables clusters of 10,000+ computers, CLOUD Software is Supporting many Data types - FLEXIBILITY Databases/persistent stores can handle all types of data Polyglot Persistence Software Graph Storage, Semantics Add all types of data and build new relationships Without disruptive upgrades / schema changes Stream data arriving; Filter the data; Keep what matches your requirements; aggregate it Deletions: immediately/gradually NOTE: TEXT AND NUMBERS ARE NOT THE SPACE PROBLEM! 8 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Confidential Oracle Restricted

SPECIAL DATA TYPES: SEVERAL POPULAR DATA MODELS: But Unique separate persistent stores results in: MANY databases to secure &manage RELATIONAL TABLES ROWS NoSql Key-Value HADOOP Sharding GRAPHS: PROPERTY RDF/OWL SEMANTICS REAL TIME IN MEMORY SPATIAL IMAGING VIDEO AUDIO COLUMNAR DOCUMENT

For National / UN Statistics: MULTI-MODEL Database is Best - Many Different Data Models Supported as Native Data Types in ONE SHARED STORE RDBMSs Graphs, Semantics Spatial, Images, Videos NoSQL Parallel Database Server has multiple models Unified Security Approach Highly Available Disaster Tolerant Shares Main Memory; more efficient Shares Disks, Flash Storage: more efficient Managed as a single entity: more efficient (ORACLE HAS THIS TODAY) 10 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Confidential Oracle Restricted

Secured National Statistics: one Multi-Model Store External Data Sources: Transactional & Operational Systems Contents Repository Databases Mobile Devices, Web resources Blogs, Mails, news Satellite Imagery, UAVs Real-time Data Streams Search, Presentation, Report, Visualization, Query SMS Automatic Responses and Publishing Console Alerts CEP EV Grid Management Multi-Model Data Management Infrastructure GeoSpatial Historical Records POIs Demographics Customer Data Call Records Documents Workflow Initiation Real-time Dashboards

Statistics Data Repurposing: Ontology-driven Enable Shared, Actionable Knowledge Application Ontologies National Mapping Private Cloud RDF & OWL Metadata Environmental Monitoring Spatial Data Geographic Names Raster Data Famine Relief Simple Features GeoRaster Topology Networks Gazetteers Data Integration National Map schemas Geographic names Temporal Naïve Geography Disaster Response

Support Breadth of National & UN Data ABOVE STOVEPIPES Data arrives, is filtered, stored data is available to all Statistics Organizations End-user and Developer Environments Developers Data Scientists Business Users Data Integration JDeveloper Discovery Statistics Mining Business Intelligence Dashboards Data Services Semantic Metadata Layer Big Data Sources Structured Data Unstructured Data Streaming Services Event Processing Statistics Data Mining Data Management Text Analytics Natural Lang. Processing ODBC Graph Analytics Sound and Video JDBC Spatial Images App Services Web-log Sessionization and Enrichment Applications Vertical Applications Social Media NoSQL Hadoop Relational Sentiment Analysis Horizontal Applications Reference Architecture Compression Security & Encryption GUIDANCE: THIS IS AN ARCHITECTURE TO SUPPORT ONE SHARED MULTIPURPOSE NATIONAL STORE

Semantic & Graph Technology What terms to look for: Buzzwords For Apps & Workflows using Semantic Web W3C RDF/OWL/SPARQL Graph Data Management Social Network Analysis (SNA) Knowledge Discovery Knowledge Mining Big Data Schema-less Data Property Graphs Taxonomy/Terminology Mgmt Faceted Search Inferencing / Reasoning Sentiment Analysis Text Mining NoSQL Database

Oracle: Graph ( Linked Open Data ) support: On-premise or in the Cloud Highly scalable, secure triple store based on RDF 1 TRILLION TRIPLE BENCHMARK, leading Triple Store:W3.org 1.13 million triples per second query performance SPARQL and GeoSPARQL in SQL support Apache Jena and OpenRDF Sesame pre-integrated SPARQL endpoint enhanced with query control GeoSPARQL support (classes, properties, datatypes, query functions) Forward-chaining based inferencing engine in the database Various native rulebases (RDFS, OWL2 RL, SKOS,...), integration with OWL2 reasonsers (TrOWL, Pellet) RDB to RDF mapping on relational data aligned with RDB2RDF standard 15

Accessible Shared Data: CYBERSECURITY is Major Challenge Requires Information Security and Privacy Oracle Database Encryption & Masking Access Control Monitoring Blocking & Logging Monitoring Configuration Management Audit Vault Total Recall Access Control Database Vault Label Security Encryption & Masking Advanced Security Secure Backup Data Masking

United Nation Analysis September 2013 Initiative on Global GeoSpatial Information Management Future Trends Technology Trends in Data Creation, Maintenance, and Management Reliance on big data technologies The right information at the right time Machine-processable descriptions of data. Semantic technologies will play an important role Skills and Training: train the individuals is at least five years Requirement for enhanced Data Management Systems

You Enhance Innovation & Statistics By Using STANDARDS e.g. The Spatial Data Domain ISO TC 211; TC 204 Open Geospatial Consortium Simple Features; GML; Web Services De-facto Standards SHP, MGE, DXF, KML Professional Standards ISPRS, FIG, WMO Java,.NET, Flash W3C: RDF,OWL, SPARQL, GeoSPARQL TAGGED METADATA agree on tags SQL3/MM Spatial

Public Clouds, Private Clouds: Statistics Platforms Used by multiple tenants on a shared basis Hosted and managed by cloud service provider Public Clouds SaaS SaaS PaaS PaaS IaaS IaaS I N T E R N E T Lower upfront costs Outsourced management Trade-offs I N T R A N E T Lower total costs OpEx CapEx & OpEx ELASTICITY is key value of Clouds Private Cloud Apps SaaS PaaS PaaS IaaS IaaS Exclusively used by a single organization Controlled and managed by in-house IT Greater control over security, compliance, QoS YOU MAY NEED A CLOUD IN EACH COUNTRY ---DEPENDS ON THEIR LAWS

Today: More HW/SW Efficiencies: But Labor Costs Growing Innovative Systems for Statistics Needed $500B $400B $300B $200B Services Maintenance $100B TIME -> Hardware Software

Guidance: Do Not Build Your Statistics Solutions From Scratch Long Term Cost of Ownership rises with custom construction & Open Source Time to Build Optimizations Maintenance UN-GGIM: train the individuals is at least five years

Guidance: Big Data for Official Statistics: Success Enhanced with MULTI-MODEL DATABASE PLATFORM Big Data Big & Fast Data Volunteered Geographic Statistical Information Sensors Streaming Data Simplify Statistics IT Simplified IT Support for Open Standards Spatial Database, Application Server, BI, tools Support by Leading Partner solutions Deep Analytics Real-time Complex Event Processing Dense Visualization On Premise, On Cloud, Shared Services Georeferenced Video, 3D, LiDAR Satellites Multi-Model Engineered Systems Spatial Analysis Graph Analytics Shared GeoSpatial Services Location Aware Everything Fully Parallel and Secure