Parallel storage, mining and visualization of environmental data archives

Similar documents
The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

PART 1. Representations of atmospheric phenomena

Space Physics Interactive Data Resource SPIDR. Web Services Guide REST API v1,v2

NetCDF and HDF Data in ArcGIS

Analysis of Climatic and Environmental Changes Using CLEARS Web-GIS Information-Computational System: Siberia Case Study

Nevada NSF EPSCoR Track 1 Data Management Plan

Open Source Visualisation with ADAGUC Web Map Services


EUMETSAT EO Portal. End User Image Access using OGC WMS/WCS services. EUM/OPS/VWG/10/0095 Issue <1> <14/01/2010> Slide: 1

Jozef Matula. Visualisation Team Leader IBL Software Engineering. 13 th ECMWF MetOps Workshop, 31 th Oct - 4 th Nov 2011, Reading, United Kingdom

Scientific Data Management and Dissemination

NCDC Strategic Vision

Agile Retrieval of Big Data with. EarthServer. ECMWF Visualization Week, Reading, 2015-sep-29

Environmental Data Management:

J9.6 GIS TOOLS FOR VISUALIZATION AND ANALYSIS OF NEXRAD RADAR (WSR-88D) ARCHIVED DATA AT THE NATIONAL CLIMATIC DATA CENTER

GOSIC NEXRAD NIDIS NOMADS

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES

The ORIENTGATE data platform

The USGS Landsat Big Data Challenge

OGC at KNMI: Current use and plans Available products

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

Project Title: Project PI(s) (who is doing the work; contact Project Coordinator (contact information): information):

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

The THREDDS Data Repository: for Long Term Data Storage and Access

VISUAL INSPECTION OF EO DATA AND PRODUCTS - OVERVIEW

Advanced Image Management using the Mosaic Dataset

Global Earth Observation Integrated Data Environment (GEO-IDE) Presentation to the Data Archiving and Access Requirements Working Group (DAARWG)

NASA's Strategy and Activities in Server Side Analytics

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Data-Intensive Science and Scientific Data Infrastructure

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Evaluating Metadata access

ArcGIS Viewer for Silverlight An Introduction

Developing Fleet and Asset Tracking Solutions with Web Maps

Bringing Big Data Modelling into the Hands of Domain Experts

On the Efficient Evaluation of Array Joins

13.2 THE INTEGRATED DATA VIEWER A WEB-ENABLED APPLICATION FOR SCIENTIFIC ANALYSIS AND VISUALIZATION

Description and Testing of the Geo Data Portal: A Data Integration Framework and Web Processing Services for Environmental Science Collaboration

NOMADS. Jordan Alpert, Jun Wang NCEP/NWS. Jordan C. Alpert where the nation s climate and weather services begin

GIS Databases With focused on ArcSDE

Norwegian Satellite Earth Observation Database for Marine and Polar Research USE CASES

Data Management Framework for the North American Carbon Program

Obtaining and Processing MODIS Data

Adding Big Earth Data Analytics to GEOSS

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

Cluster, Grid, Cloud Concepts

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Remote Sensitive Image Stations and Grid Services

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

NCDC's Application of Climate Data to Tourism Business Decision-Making

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Consuming and Producing Web Services with WST and JST. Christopher M. Judd. President/Consultant Judd Solutions, LLC

SuperGIS Server 3.2 Standard Edition Specification

Steve Ansari *, Stephen Del Greco, Brian Nelson, and Helen Frederick NOAA National Climatic Data Center, Asheville, North Carolina 2.

GeoMedia Product Update. Title of Presentation. Lorilie Barteski October 15, 2008 Edmonton, AB

IDL. Get the answers you need from your data. IDL

Model examples Store and provide Challenges WCS and OPeNDAP Recommendations. WCS versus OPeNDAP. Making model results available through the internet.

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

MicroStrategy Course Catalog

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

NASA s Big Data Challenges in Climate Science

Cloud-based Geospatial Data services and analysis

Big Data Volume & velocity data management with ERDAS APOLLO. Alain Kabamba Hexagon Geospatial

DISMAR: Data Integration System for Marine Pollution and Water Quality

Big Data and Cloud Computing for GHRSST

Enterprise GIS Solutions to GIS Data Dissemination

Developing Business Intelligence and Data Visualization Applications with Web Maps

ACE 2011 International

Oklahoma s Open Source Spatial Data Clearinghouse: OKMaps

A standards-based open source processing chain for ocean modeling in the GEOSS Architecture Implementation Pilot Phase 8 (AIP-8)

Agile Analytics on Extreme-Size Earth Science Data

Geospatial Software Solutions for the Environment and Natural Resources

THREDDS. THematic Real-time Environmental Distributed Data Services. Connecting people, documents and data

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Data Grids. Lidan Wang April 5, 2007

Databases & Web Applications Lab Big Data Project A

Transcription:

Parallel storage, mining and visualization of environmental data archives Mikhail Zhizhin, Dmitry Medvedev, Alexey Poyda, Dmitry Mishin and Sergei Berezin Space Research Institute and Geophysical Center Russian Academy of Sciences

Collaboration with Microsoft Research 2006-2009 Environmental Scenario Search Engine (ESSE) Site: Geophysical Center, Moscow State University and MSR Cambridge PIs: Mikhail Zhizhin (RAS), Eric Kihn (NOAA) and Vassily Lyutsarev (MSRC) PhDs: Alexey Poyda (Moscow State University), Dmitry Mishin and Dmitry Medvedev (RAS) Summary: ESSE is an interactive search engine for data mining in environmental data archives. What makes it so different from conventional text-based search engines is that it actually searches inside the numeric datasets using fuzzy logic to describe transitions between environmental states 2007-2009 Climate Induced Vegetation Change Analysis Tool (CLIVT) Site: Space Research Institute and MSR PIs: Eugeny Lupian and Mikhail Zhizhin (RAS) PhDs: Maria Medvedeva, Alexey Poyda, Dmitry Medvedev, Dmitry Voytsehovsky Summary: In CLIVT project we bring together large archives of satellite images and historical data on vegetation and climate for the territory of Northern Eurasia and develop a new technique to study relations between the ecosystems dynamics and the climate change

Joint Research Center IKI-MSR in Moscow Framework Agreement signed at Moscow State University on March 17 th, 2009 Administrative structure and research project agreements will be elaborated in June 2009 Main directions of research: The Parties wish to collaborate on research concerning global change of climate, ecology and space environment in their interrelation, which will require satellite and ground-based sensor observations together with data intensive highperformance computing for environmental monitoring, modeling and data mining

What are the challenges Repetitive tasks to design (very large) databases for new data products. Interactive access times for any projection of the data array Never delete/overwrite data, lineage-accreditation-quality-type flags Multilayer (catalog, inventory, order, process) distributed metadata storage [STANDARD] Semantically rich common data model [STANDARD] and query language [STANDARD] for (environmental) scientific datasets Functionally rich data services supporting data extraction, processing and mining implemented at the data server [STANDARD] Distributed algorithms to balance network/database load Data export modeling visualize ingest workflow; reference webservices for basic datasets and models Clever and seamless integration of MS Virtual Earth, Google Maps, WMS and scientific visualization libraries Parallel visualization algorithms (GIS?), applications and viewers for very large images, maps and video streams on tiled displays

Data processing, analysis and visualization workflow Virtual Observatory XML metadata and portal REST and SOAP templates OGSA-DAI Grid data services ActiveStorage NetCDF and NcML NetCDF API Metadata WMS,WCS Virtual Earth KML and tile servers MM5 and WRF mesoscale weather models Matlab

Virtual Observatory XML metadata search engine Open Source middleware VxOware Tiers: 1) Web application; 2) REST services; 3) native XML database backend + native object stores with indexing (documents, images ) XML: multiple catalog-level metadata schemas, e.g. FGDC, ECHO, SPASE, NGDC Ordering Extensions Distributed metadata search over VO federation using REST services

Virtual Observatory for Metadata: A Complete Data Environment is More than Just the Bits Metadata store Virtual Observatory Web application or portal Web service API for Data Sources FGDC records Ordering Extensions XML Ordering Extentions OE (1 *) FGDC Metadata Data Request REST or SOAP API OGSA-DAI client toolkit SEARCH in metadata OGSA-DAI Resource and Activities Wiki Documents User Guide ResourceID FGDC catalog XML CLASS, SPIDR, ActiveStorage Plugin Presentations Slideshow Search result: ResourceID_1 ResourceID_2 ResourceID_3 Visualization service Inventory service Order service CLASS products

Ordering Extensions XML schema: station map XML element Data order web form XSLT

Why OGSA-DAI service container? Standard tool in the Grid community Supports distributed workflow (in version 3.*) Built in support for asynchronous transactions Compatible with Web (Axis) and Grid (OMII, UNICORE, GT4) Looked at alternatives like OpenDap, WCS, documentation of our analysis is available Problem 1: it is very complex Solution: REST wrapper Problem 2: supports only File, SQL and XML data types and queries Solution: implement additional data sources and functions for data in multidimensional arrays

ESSE / OGSA-DAI extensions Provide catalog and inventory level metadata about a data source Support multidimensional array data model (in addition to SQL/XML/BLOB) Handle SOAP and REST requests for data export Have local data processing and fuzzy logic data mining functions Provide persistent storage for the data processing and environmental models output (as a new dataset) Can be chained into asynchronous distributed data processing workflow

OGSA-DAI Data Order Flow 1 2 OE Web Form Servlet XSLT get Data Types Time-series Sunspot number Grids NCEP Reanalysis Stations Ionospheric Soundings Swath - AVHRR Profiles Ocean Profile Maps Nighttime lights More? Client Process Document via SOAP 3 XML Result Error Message Data Server OGSA- DAI Adapter Storage Get Data Process Mine SQL XML Granule Time Series

Environmental Scenario Search Engine State S 1 corresponding to the red (upper-right) region is the fuzzy expression: S 1 = (Very Large P) and (Very Large T) State S 2 corresponding to the cyan (lower-left) region is: S 2 = (Very Small P) and (Very Small T) Time series as a trajectory in the two-dimensional phase space (P-pressure, T-temperature) Combining the descriptions of the states with the time shift operator shift dt, we can write the following symbolic expression for the Environmental Scenario very low temperature and pressure after very high temperature and pressure : (shift dt=1 S 1 ) and S 2

Web editor for a multi-state environmental scenario Search results

Parallel Active Data Storage Open Source software developed in collaboration with MSR Cambridge Data provided by NCAR and NGDC NOAA Common Data Model and API compatible with Unidata CDM for NetCDF/HDF Scalable parallel storage and processing engine based on MS SQL server Capable to store terabytes of gridded output of numerical weather models and raw meteorological station reports Special client library with API and an OGSA-DAI plugin. The OGSA-DAI receives from ActiveStorage a CMD object and transforms it into different ES formats such as NcML, NetCDF, HDF...

Common Data Model (CDM) Dataset -name -name Group DataType Attribute -name -value -datatype Variable Dimension -name -length -char -byte -short -int -long -float -double -String -name -shape -datatype Common Data Model (CDM) is a ES standard used in OpenDAP, netcdf4 and HDF5 as a general representation of multivariate numeric arrays. Sum-models such as grids (geophysical fields), points (observatories) and trajectories (ships, airplanes, satellites) are supported

Database schema to map CDM into ActiveStorage

Data retrieval scheme: single MS SQL server 1. Call the client library with array coordinates as call parameters x min, y min x max, y max 2. Issue commands to the database server 3. Select the requested data parts from the appropriate chunks Client library SQL Server database 5. Merge the data parts and return the whole array to the user 4. Return the data parts to the client library The database engine performs only the basic array selection and subsetting The client library does all the rest (merging chunks, type conversion, etc.) Two versions of the client library:.net and Java

Distributed queries: MS SQL database cluster SQL Server database Client library... SQL Server database Portions of the global array can be stored on several database servers to increase performance

NetCDF API for ActiveStorage in MATLAB import ru.wdcb.mdb.ncconnector import com.microsoft.sqlserver.jdbc.sqlserverdriver s = 'jdbc:sqlserver://localhost:1433;databasename=ncep_01;user=g uest;password=guest'; connector = NcConnector(); ncid = connector.nc_open(s,0); varid = connector.nc_inq_varid(ncid,'air'); origin = [0 0 10 10]; size = [80000 1 1 1]; stride = [1 1 1 1]; A = connector.nc_get_vars_short(ncid,varid,origin,size,stride); plot(a, 'DisplayName', 'A', 'YDataSource', 'A'); figure origin = [0 0 0 0]; size = [1 1 73 144]; stride = [1 1 1 1]; B = connector.nc_get_vars_shortm(ncid,varid,origin,size,stride); B = reshape(b,[73 144]); imagesc (B); figure(gcf);

NCEP/NCAR Weather Reanalysis Continually updating gridded data set Global Circulation Model output 74 weather parameters 5000 netcdf files, 30 500 MB each Time coverage: 1948 2008 4-hourly values Grids: Regular grid, 2.5 x 2.5 degrees T62 Gaussian grid, 192 x 94 points.

NCDC Meteorological Observations Records Fixed ground stations Ships Mobile stations Buoys 1901 2008 time coverage. 30 million sensors. 470 000 ASCII files packed with gzip. 50 GB packed; 400 GB unpacked. 1.7 billion observations. Map of the meteorological stations in the database

Integration of remote sensing and climate data in CLIVT Multi-annual NDVI time-series by land cover types Regular cell-grid for data integration NDVI for Evergreen Needleleaf Forest NDVI averaging for 2,5 x 2,5 cell-grid by land cover types 1 st decade of June 1999 2 nd decade of June 1999 Land cover map GLC2000 3 rd decade of June 2007 Multi-annual NDVI time-series 1 0,8 0,6 0,4 0,2 0-25 -20-15 -10-5 0 5 10 15 20 25 Integrated analysis Air-Temperature 1 st decade of June 1999 2 nd decade of June 1999 1 st decade of June 1999 2 nd decade of June 1999 3 rd decade of June 2007 3 rd decade of June 2007 Multi-annual time-series of meteorological data

Web technologies for visualization of different data types with geolocation KML & georss Web-services for CDM data sources OGC Web Map Services WMS/WFS/WCS MS Virtual Earth Google Maps

VisualESSE plugin for NASA World Wind desktop client CodePlex Open Source project http://www.codeplex.com/visualesse

MS Virtual Earth, OGC Web Map Service and NcML grid overlays OGC WMS web map image with transparency control Stable world nighttime lights by NGDC NOAA NcML grid extracted from ActiveStorage Current surface temperature by NWS NOAA

Reanalysis and forecast weather data fusion Related to a selected pushpin 50 years of weather history from NCEP/NCAR Reanalysis database 1 week weather forecast from NWS database

Fuzzy search and Virtual Earth mapping of environmental events Search for events at given locations Select a set of fuzzy scenarios from the VO library and a time interval (history and forecast) XSL transform of the search engine XML output into KML Map the KML: any location, any events, any time window

UIC SAGE 3.0 ported on MS Windows Fully functional, not only local display PsTools utilities instead of rsh Uses Windows build-in security Existing applications JuxtaView bitplayer, mplayer Library for.net interoperability WorldWind for SAGE http://www.codeplex.com/winsage

MultiViewer application Rendering clients 4 4 1 1 5 5 2 2 6 6 3 3 UI Controller Each node performs data fetching, processing and rendering Better utilization of videocluster resources

Transparent Data Cube All HPC components from the CLIVT toolbox can run on the same parallel cluster. At the IKI Computing Center in Moscow we utilize a 12-node cluster with WCC MPI fro MM5, MS SQL Server databases for ActiveStorage and and 12-display videowall for Multiviewer. We call this parallel installation for storage, modeling and visualization Transparent Data Cube. 4 4 1 1 5 5 2 2 6 6 3 3

Directions for further research Continue analysis of climate-biosphere interactions Sun-Earth connections, including climate, ionosphere, magnetosphere, cosmic rays Data-intensive and cloud computing on Microsoft HPC/Azure platform in remote sensing, environmental databases and sensor networks Tiled display / Virtual Earth / Deep Zoom / SAGE visualization platform / World Wide Telescope Multispectral micro-remote sensing for art conservation