Big Data Analytics WG: Use Case Array Databases



Similar documents
Agile Retrieval of Big Data with. EarthServer. ECMWF Visualization Week, Reading, 2015-sep-29

Agile Analytics on Extreme-Size Earth Science Data

Databases & Web Applications Lab Big Data Project A

A Big Picture for Big Data

RDA. RDA: Co-chairing - Big Data IG - Geospatial IG. OGC co-chairing: - editor, Big Geo Data stds - BigData.DWG

On the Efficient Evaluation of Array Joins

Handling Heterogeneous EO Datasets via the Web Coverage Processing Service

Adding Big Earth Data Analytics to GEOSS

WCS as a Download Service for Big (and Small) Data

OGC s Big Data Initiative

RDA PROPOSAL FOR Array Database Working Group (AD-WG) Peter Baumann, Jacobs University

Elephant Meeting on Big Data Services

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

VITO Centre of Image Processing

NetCDF and HDF Data in ArcGIS

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Cloud Computing and Advanced Relationship Analytics

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Integrating Big Data into the Computing Curricula

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Log Mining Based on Hadoop s Map and Reduce Technique

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

NIST Big Data PWG & RDA Big Data Infrastructure WG: Implementation Strategy: Best Practice Guideline for Big Data Application Development

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Arrays in database systems, the next frontier?

HPC technology and future architecture

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Development of Earth Science Observational Data Infrastructure of Taiwan. Fang-Pang Lin National Center for High-Performance Computing, Taiwan

Cloudera Certified Developer for Apache Hadoop

Big Data Means at Least Three Different Things. Michael Stonebraker

Developing Fleet and Asset Tracking Solutions with Web Maps

ECS 165A: Introduction to Database Systems

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

massive aerial LiDAR point clouds

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

Luncheon Webinar Series May 13, 2013

NoSQL for SQL Professionals William McKnight

Databases for 3D Data Management: From Point Cloud to City Model

Using In-Memory Computing to Simplify Big Data Analytics

Next-Generation Cloud Analytics with Amazon Redshift

Industry 4.0 and Big Data

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Oracle Big Data SQL Technical Update

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

What Does Big Data Mean and Who Will Win? Michael Stonebraker

<Insert Picture Here> Data Management Innovations for Massive Point Cloud, DEM, and 3D Vector Databases

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Use of OGC Sensor Web Enablement Standards in the Meteorology Domain. in partnership with

Discovering Business Insights in Big Data Using SQL-MapReduce

Big Data Explained. An introduction to Big Data Science.

Spatio-Temporal Gridded Data Processing on the Semantic Web

Angelos Tzotsos IMIS Athena, Scientific & Technical Manager OSGeo Charter Member Copernicus Big Data Workshop, March 2014

SAP HANA Cloud Platform Overview Customer

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Geospatial Platforms For Enabling Workflows

Topics in basic DBMS course

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

Hierarchical Storage Support and Management for Large-Scale Multidimensional Array Database Management Systems 1

Oracle Big Data Spatial and Graph

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

May 2013 Oracle Spatial and Graph User Conference

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Microsoft Band Web Tile

Big Data Executive Survey

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Lustre Monitoring with OpenTSDB

Situational Awareness Through Network Visualization

Where is... How do I get to...

JDSU Partners with Infobright to Help the World s Largest Communications Service Providers Ensure the Highest Quality of Service

Using distributed technologies to analyze Big Data

Geospatial Platforms For Enabling Workflows

How To Scale Out Of A Nosql Database

Standard Big Data Architecture and Infrastructure

Geospatial Technology Innovations and Convergence

Remote Sensitive Image Stations and Grid Services

The Internet of Things and Big Data: Intro

Introduction to SQL for Data Scientists

Fields as a Generic Data Type for Big SpaLal Data

Spatio-Temporal Big Data

InfiniteGraph: The Distributed Graph Database

How To Handle Big Data With A Data Scientist

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Connecting Hadoop with Oracle Database

Bigtable is a proven design Underpins 100+ Google services:

Oracle: Database and Data Management Innovations with CERN Public Day

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Creating A Galactic Plane Atlas With Amazon Web Services

Oklahoma s Open Source Spatial Data Clearinghouse: OKMaps

Elastic Enterprise Data Warehouse Query Log Analysis on a Secure Private Cloud

Managing Large Imagery Databases via the Web

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Transcription:

Big Data Analytics WG: Use Case Array Databases RDA 4th Plenary 2014-sep-22, Amsterdam, The Netherlands Peter Baumann Jacobs University rasdaman GmbH baumann@rasdaman.com

Structural Variety in Big Data Stock trading: 1-D sequences (i.e., arrays) Social networks: large, homogeneous graphs Ontologies: small, heterogeneous graphs Climate modelling: 4D/5D arrays Satellite imagery: 2D/3D arrays (+irregularity) Genome: long string arrays Particle physics: sets of events Bio taxonomies: hierarchies (such as XML) Documents: key/value stores = sets of unique identifiers + whatever etc.

Arrays in [Geo] Science & Engineering spatio-temporal sensor, image, simulation, statistics data(cubes) sensor feeds [OGC SWE] Big Data server

rasdaman: Agile Array Analytics raster data manager : SQL + n-d raster objects select img.green[x0:x1,y0:y1] > 130 from LandsatArchive as img where avg_cells( img.nir ) < 17 Scalable parallel tile streaming architecture In operational use - OGC Web Coverage Service Core Reference Implementation

Inset: Hadoop is not the Answer to All no builtin knowledge about structured data types - Since it was not originally designed to leverage the structure [ ] its performance [ ] is therefore suboptimal [Daniel Abadi] M. Stonebraker (XLDB 2012): will hit a scalability wall

Adaptive Tiling Sample tiling strategies [Furtado]: - regular directional area of interest rasdaman storage layout language insert into MyCollection values... tiling area of interest [0:20,0:40], [45:80,80:85] tile size 1000000 index d_index storage array compression zlib

Sample Application: Database Visualization select encode( struct { red: (char) s.image.b7[x0:x1,x0:x1], green: (char) s.image.b5[x0:x1,x0:x1], blue: (char) s.image.b0[x0:x1,x0:x1], alpha: (char) scale( d.elev, 20 ) }, "image/png" ) from SatImage as s, DEM as d

Use Case: Plymouth Marine Laboratory Avg chlorophyll concentration for given area & time period, from x/y/t cube - 10, 60,120, 240 days Conclusions: - we must minimise data transfer as well as [client] processing - standards such as WCPS provide the greatest benefit [Oliver Clements, EGU 2014] rasdaman

Secured Archive Integration First-ever direct, ad-hoc mix from protected NASA & ESA services in OGC WCS/WCPS Web client (EarthServer + CobWeb)

Parallel / Distributed Query Processing 1 query 1,000+ cloud nodes select max((a.nir - A.red) / (A.nir + A.red)) - max((b.nir - B.red) / (B.nir + B.red)) - max((c.nir - C.red) / (C.nir + C.red)) - max((d.nir - D.red) / (D.nir + D.red)) from A, B, C, D Dataset C Dataset D Dataset A Dataset B

Array Databases: Practice Proven with rasdaman from simple data access to agile analytics - strictly based on open OGC Big Geo Data standards 130+ TB databases, 2D, 3D x/y/z & x/y/t, 4D x/y/z/t timeseries single query distributed to 1,000+ cloud nodes

OGC WCPS OGC Web Coverage Processing Service (WCPS) = high-level geo raster query language; adopted 2008 - WCPS 2: all grid types: "From MODIS scenes M1, M2, M3: difference between red & nir, as TIFF" but only those where nir exceeds 127 somewhere for $c in ( M1, M2, M3 ) where some( $c.nir > 127 ) return encode( $c.red - $c.nir, image/tiff ) (tiff A, tiff C ) 12

Recent Progress: ISO Array SQL ISO 9075 Part 15: SQL/MDA - resolved by ISO SQL WG in June 2014 n-d arrays as attributes create table LandsatScenes( id: integer not null, acquired: date, scene: row( band1: integer,..., band7: integer ) array [ 0:4999,0:4999] ) declarative array operations select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), image/tiff ) from LandsatScenes where acquired between 1990-06-01 and 1990-06-30 and avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0