Research funded through EU FP7 283610 EarthServer European Scalable Earth Science Service Environment Adding Big Earth Data Analytics to GEOSS GEO IX Plenary Foz do Iguacu, 2012-nov-20 Peter Baumann, Stefano Nativi Jacobs University Germany, CNR Italy [gamingfeeds.com] 1
Features & Coverages The basis of all: geographic feature = abstraction of a real world phenomenon [OGC, ISO] associated with a location relative to Earth Special kind of feature: coverage Typical representative: raster image...but there is more! Typically, Big Data are coverages 3
Big Data : The 4 Vs Volume Velocity Variety Veracity [M. Stonebraker and IBM] 4
Raster Data Volume Social Networks Incidence matrix of size 10^8 x 10^8...now do linear algebra! Satellite Imagery HPC ngeo plannings: 10^12 images under ESA custody Even with multi-terabyte local disk sub-systems and multi-petabyte archives, I/O can become a bottleneck in HPC. -- Jeanette Jenness, LLNL, ASCI-Project, 1998 Users download 10x more data than needed -- Kerstin Kleese van Dam, 2002 5
Raster Data Velocity NASA MODIS instrument on board of AQUA & TERRA ~ 1 TB per day LOFAR: distributed sensor array farms for radio astronomy 3 GB per second per station sustained, consolidated into 2 3 PB per year M. Stonebraker: drinking from the firehose 6
Raster Data Variety Sensor, image, model, & statistics data Life Science: Pharma/chem, healthcare / bio research, bio statistics, genetics,... Geo: Geodesy, geology, hydrology, oceanography, meteorology, earth system,... Engineering & research: Simulation & experimental data in automotive/shipbuilding/ aerospace industry, turbines, process industry, astronomy, high energy physics,... Management/Controlling: Decision Support, OLAP, Data Warehousing, census, statistics in industry and public administration,... Multimedia: e-learning, distance learning, prepress,... 80% of all data have some spatial connotation [C&P Hane, 1992] 7
Raster Data Variety: Coverages n-d "space/time-varying phenomenon" [ISO 19123, OGC 09-146r2] «FeatureType» Abstract Coverage Grid Coverage MultiSolid Coverage MultiSurface Coverage MultiCurve Coverage MultiPoint Coverage Referenceable GridCoverage Rectified GridCoverage 8
Raster Data Veracity Both measured and computed data need to carry quality information as part of provenance Sometimes established (costly!) procedures for error estimation, sometimes not Ex: Satellite image processing, from L0 to L2 Many quality criteria determined, but hardwired error propagation by far not always customary What to do with this information? Complicates life of data consumer dramatically! [l2gen, bitmask for ocean color] 9
Let s Take a Closer Look... Remember? Users download 10x more data than needed [Kerstin Kleese van Dam, 2002] t Divergent access patterns for ingest and retrieval Server must mediate between access patterns 10
Use Case: Satellite ImageTime Series [Diedrich et al 2001] 11
The rasdaman Raster Analytics Server Raster DBMS for massive n-d raster data www.rasdaman.org rasql = SQL with integrated raster processing select img.green[x0:x1,y0:y1] > 130 from Tile-based architecture LandsatArchive as img n-d array set of n-d tiles Extensive optimization, hw/sw parallelization In operational use dozen-terabyte objects Analytics queries in 50 ms on laptop 12
Query Processing in a Federation Heterogeneous federation / cloud Can optimize for data location, transport volume, node load,... Work in progress array A select encode( (A.nir - A.red) / (A.nir +A.red), array-compressed ) from A [Owonibi 2012] select encode( ( (A.nir - A.red) / (A.nir + A.red) - (B.nir - B.red) / (B.nir + B.red) ), HDF5 ) from A, B Array B select encode( (B.nir - B.red) / (B.nir + B.red), array-compressed ) from B 13
What Raster Analytics Offers Raster Query Language: ad-hoc navigation, extraction, aggregation, analytics Time series Image processing Summary data Sensor fusion & pattern mining 14
Ex: Climate Data Service [MEEO 2012] 15
3D Clients: Experiments Problem: coupling DB / visualization Approach: deliver RGBA image to X3D client, transparency as height Feed directly into client GPU select encode( { red: (char) s.b7[x0:x1,x0:x1], green: (char) s.b5[x0:x1,x0:x1], blue: (char) s.b0[x0:x1,x0:x1], alpha: (char) scale( d, 20 ) }, "png" ) from SatImage as s, DEM as d [JacobsU, Fraunhofer 2012] 16
EarthServer: Big Earth Data Analytics Scalable On-Demand Analytics & Fusion for all Earth Sciences 11 partners (lead: JacobsU), 7 mus$ budget, 2011-sep-01 2014-aug-31 6 * 100+ TB databases for all Earth sciences + planetary science www.earthserver.eu Advisory board: OGC, ESA, IEEE 17
Web Coverage Service (WCS) Core: Simple access to multi-dimensional coverages subset = trim slice WCS Extensions for additional functionality facets encodings, band extraction, scaling, reprojection, interpolation, query language, data upload,... 18 18
Integration of OGC WCS and SWE SWE O&M and SOS (+ friends): specialized for sensor acquisition, some complexity upstream acquisition GMLCOV and WCS (+WCPS): simple, uniform schema for all coverages; scalable; versatile processing downstream services O&M + SensorML coverage server GMLCOV + WCS Semantic Web 19
Conclusion: Agile Analytics Propose EarthServer platform, rasdaman, as contribution to CGI Flexible ad-hoc processing & filtering Working in situ on existing archives; no copying! Integrated n-d coverage data / metadata search Smooth integration with GEOSS Broker Scalable n-d interfaces using OGC standards WMS, WCS suite including WCPS, WPS nd visual coverage client toolkit 1D diagrams, 2D maps, 3D data cubes, 3D timeseries sets,... Dynamically composed from query results 20