Maximising the utility of OpeNDAP datasets through the NetCDF4 API

Similar documents
Benchmarking OPeNDAP services for modern ESM data workloads

The Big Data Bioinformatics System

OpenDAP configuration course

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

CHAPTER FIVE RESULT ANALYSIS

Performing Load Capacity Test for Web Applications

AppDynamics Lite Performance Benchmark. For KonaKart E-commerce Server (Tomcat/JSP/Struts)

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Data Centers and Cloud Computing

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

Understanding the Benefits of IBM SPSS Statistics Server

Data Centers and Cloud Computing. Data Centers

SOLUTION BRIEF: SLCM R12.8 PERFORMANCE TEST RESULTS JANUARY, Submit and Approval Phase Results

The ORIENTGATE data platform

STI Hardware Specifications for PCs

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Data Centers and Cloud Computing. Data Centers

NetCDF and HDF Data in ArcGIS

Muse Server Sizing. 18 June Document Version Muse

Infor Web UI Sizing and Deployment for a Thin Client Solution

The Monitis Monitoring Agent ver. 1.2

Shoal: IaaS Cloud Cache Publisher

Performance Analysis and Capacity Planning Whitepaper

Merge CADstream. For IT Professionals. Merge CADstream,

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

owncloud Enterprise Edition on IBM Infrastructure

SMART Bridgit software

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Legal Notices Introduction... 3

Online Remote Data Backup for iscsi-based Storage Systems

Liferay Portal s Document Library: Architectural Overview, Performance and Scalability

IT Business Management System Requirements Guide

Performance Test Results Report for the Sled player

Deploying in a Distributed Environment

SuperGIS Server 3 Website Performance and Stress Test Report

Microsoft Exchange Server 2003 Deployment Considerations

Oracle Applications Release 10.7 NCA Network Performance for the Enterprise. An Oracle White Paper January 1998

Brainlab Node TM Technical Specifications

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

NetIQ Access Manager 4.1

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Zeus Traffic Manager VA Performance on vsphere 4

Microsoft Dynamics AX 2012 System Requirements. Microsoft Corporation Published: November 2011

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

GraySort and MinuteSort at Yahoo on Hadoop 0.23

System requirements for MuseumPlus and emuseumplus

Performance Analysis of Web based Applications on Single and Multi Core Servers

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

HPSA Agent Characterization

Performance Optimization For Operational Risk Management Application On Azure Platform

Introduction to Cloud Computing

Microsoft Dynamics AX 2012 System Requirements. Microsoft Corporation Published: August 2011

INUVIKA OPEN VIRTUAL DESKTOP FOUNDATION SERVER

Scaling in a Hypervisor Environment

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Model examples Store and provide Challenges WCS and OPeNDAP Recommendations. WCS versus OPeNDAP. Making model results available through the internet.

Cross-domain Identity Management System for Cloud Environment

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Performance test report

User Installation Guide

Knowledge Article Performance Comparison: BMC Remedy ITSM Incident Management version Vs on Windows

AgencyPortal v5.1 Performance Test Summary Table of Contents

Performance Analysis of Lucene Index on HBase Environment

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

VI Performance Monitoring

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Sage ERP Accpac. Compatibility Guide Version 6.0. Revised: November 18, Version 6.0 Compatibility Guide

HYCOM Data Management -- opportunities & planning --

Riverbed Stingray Traffic Manager VA Performance on vsphere 4 WHITE PAPER

How To Use Chronos Mount For Astronomical Use

How To Use A Vmware View For A Patient Care System

Mobile Cloud Computing for Data-Intensive Applications

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Data Lab System Architecture

W H I T E P A P E R : T E C H N I C A L. Understanding and Configuring Symantec Endpoint Protection Group Update Providers

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Bridgit 4.6 software

Mobile Application Development

Distributed Framework for Data Mining As a Service on Private Cloud

This guide specifies the required and supported system elements for the application.

Enabling Technologies for Distributed Computing

Hardware & Software Specification i2itracks/popiq

Transcription:

Maximising the utility of OpeNDAP datasets through the NetCDF4 API Stephen Pascoe (Stephen.Pascoe@stfc.ac.uk) Chris Mattmann (chris.a.mattmann@jpl.nasa.gov) Phil Kershaw (Philip.Kershaw@stfc.ac.uk) Ag Stephens (Ag.Stephens@stfc.ac.uk) 1

Opportunities for the NetCDF/OPeNDAP Platform Application Local Files Script High-Level Tools NetCDF-API WAN ESGF Security OPeNDAP Server Remote Files Virtual Datasets Web App. LAN ESGF Security OPeNDAP Server Remote Files Virtual Datasets 2

Key Questions What performance can a user expect today from highlevel interaction with OPeNDAP datasets through the NetCDF API? Can OPeNDAP servers cope with these scenarios? How can we improve from where we are? 3

OpeNDAP Test Framework http://github.com/stephenpascoe/dapbench Tools to intercept OPeNDAP requests from clients for analysis Load-testing framework for testing OPeNDAP servers Specific tests discussed here pre-alpha work in progress 4

Test System Client Measured Bandwidth: 930Mbps (iperf) LAN Server DELL optiplex 980. Intel i5 4-cores @ 3.2GHz 1Gbps NIC NetCDF 4.1.2-beta2, HDF5-1.8.4-patch1 Xen Virtual Machine Host: DELL PowerEdge 2950 III 2x quad-core CPU, 24GB RAM Guest VM: Paravirtualised OpenSuSE 11.0 2 CPU, 8GB RAM, 4GB swap Test Servers Hyrax: OLFS-1.7.1, bes-3.9.0, dap-server-4.1.0 THREDDS Data Server: 3.17.3.1 Pydap: 3.0.rc.15. netcdf4-python-0.9.3. Platforms 64-bit Java SDK 1.6.0-13, Tomcat-6.0.20. JAVA_OPTS=-Xms1000M -Xmn500M -Xmx1500M Apache-2.2.8 (prefork), mod_wsgi-3.3 5

Client Tests Test Clients Test Dataset CDAT (cdat-lite 6.0-alpha-3) 4D temperature field CDO 1.4.7 shape [120,4,144,192] time/level/lat/lon 51MB NetCDF file. 1. Take a 45ºx45º subset of entire field Test Requests 2. Regrid entire field to 100x100 lat/lon resolution 6

Client Results Test Tool DODS Requests Download Size Comment Subset CDO 481 50Mb 1 + time*level requests CDAT 17281 1.5Mb 1 + time*level*lat requests Regrid CDO 481 50Mb 1 + time*level requests CDAT 69121 50Mb 1 + time*level*lat requests CDO downloads too much and with sub-optimal number of requests. CDAT downloads just enough but with very sub-optimal number of requests. CDAT iterates over all outer dimensions. 7

Server Tests Test Framework Test Dataset Grinder load-testing framework NetCDF-Java API Python httplib2 30 files of 4D temperature field shape [120,4,144,192] time/level/lat/lon 600MB per NetCDF file. Test Requests 1. Request all of a random file in n slices, n = [15, 30, 60, 120, 240, 720, 1440] 2. Continuously reqest random subsets in m threads, m = [1,.. 20] 8

Health Warning Preliminary Results These data may say as much about the test environment as the servers tested 9

Server Test 1: ramp slices ~360Mbps Time to download whole field Equivilent Bandwidth 200000 50.00 180000 45.00 160000 40.00 140000 35.00 120000 30.00 Time / ms 100000 80000 MB/s pydap 25.00 tds hyrax 20.00 pydap tds hyrax 60000 15.00 40000 10.00 20000 5.00 0 0 200 400 600 800 1000 1200 1400 1600 Requests 0.00 0 200 400 600 800 1000 1200 1400 1600 Requests 10

Server Test 2: Parallel 11

Solutions 2 Ideas 12

How do we save clients from themselves? What if clients were forced to cache data themselves? What if the server only responded to sensible sized requests? What if we could restrict the number of allowed requests and cache them? 13

DAP Response tiling / chunking Browser Cache WMS web-clients use tiling to improve client & server performance Proxy Cache Tiled WMS Tile Cache WMS Tiles Server 14

REST Constraints Client-server: separation of concerns Stateless: no client-context stored on the server Cacheable: responses can be cached to improve performance Layered: clients are unaware whether they are connected directly to the end server OPeNDAP? Code on demand (optional): Servers can extend clients with custom code? Uniform Interface: generic interface between client and server decouples them allowing independent evolution. 15

Big wins for caching If tiling/chunking could help standard OpeNDAP performance it could hugely help server-side processing Consider http://example.com/mydataset.nc.dods?zonalmean(tas) But not all server-side processing can be expressed as a URI and complete synchronously... 16

Beyond Synchronous OPeNDAP We know we have to develop server-side processing systems How do we keep them RESTful and in the spirit of OPeNDAP? So that the outputs remain reusable and cacheable 17

OODT Data Processing Architecture C. Mattmann, D. Freeborn, D. Crichton, B. Foster, A. Hart, D. Woollard, S. Hardman, P. Ramirez, S. Kelly, A. Y. Chang, C. E. Miller. A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions. In Proceedings of the 3rd IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pp. 165-172, July 19-23, 2009. 18

WPS Process Workflow GET DescribeProcess resource to discover process arguments POST to create a process execution resource (unique URL) GET to poll status of process execution Navigate to outputs when available 19

Baby steps towards WPS/OPeNDAP integration WPS Outputs as OpeNDAP Datasets 20

Is there a generic lightweight pattern? A Standard way to say Processing, Come back later Where should the process execution description? Specific DAP response Containers, e.g. Catalogues Provenance responses Describing processes and executions is an extra step 21

Summary Clients and Servers need to improve to support efficient highlevel OPeNDAP access through the NetCDF API Benchmarking could help us find where we need to focus our work Caching and server-side processing should be part of the solution But keep it lightweight and RESTful 22

Thanks Stephen.Pascoe@stfc.ac.uk http://github.com/stephenpascoe/dapbench 23

24

TDS Pydap Hyrax 25