EIDA WFCatalog Service!!! Luca Trani and the EIDA Team

Similar documents
Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation. Report

Learning Management Redefined. Acadox Infrastructure & Architecture

Cloud Scale Distributed Data Storage. Jürmo Mehine

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Technical Overview Simple, Scalable, Object Storage Software

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

DataNet Flexible Metadata Overlay over File Resources

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics Nokia

Client Overview. Engagement Situation. Key Requirements

Big Data and Analytics (Fall 2015)

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

Performance Analysis for NoSQL and SQL

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Report. Architecture specification for data. workbench

INTRODUCTION TO CASSANDRA

Interaction with other IT projects: EUDAT2020, VLDATA, ENVRI PLUS,

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

How graph databases started the multi-model revolution

How SourceForge is Using MongoDB

Fast Innovation requires Fast IT

CI Pipeline with Docker

How to Design and Create Your Own Custom Ext Rep

Applications for Big Data Analytics

Exploiting peer group concept for adaptive and highly available services

Querying MongoDB without programming using FUNQL

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

Scalable Architecture on Amazon AWS Cloud

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

Building Internet of Things Apps with Cloudant DBaaS. Andy Ellicott, Cloudant John David Chibuk, Kiwi Wearables

Web Application Hosting in the AWS Cloud Best Practices

Performance and Scalability Overview

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Emerging Requirements and DBMS Technologies:

Mobile + HA + Cloud. Eugene Ciurana! ! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails!

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Performance and Scalability Overview

Big data and urban mobility

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

TECHNOLOGY WHITE PAPER Jun 2012

Sentimental Analysis using Hadoop Phase 2: Week 2

Summary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

An Approach to Implement Map Reduce with NoSQL Databases

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco


Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

5/01/2013 CLOUD ARCHITECTURE

Document Oriented Database

LARGE-SCALE DATA STORAGE APPLICATIONS

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

nosql and Non Relational Databases

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

EDG Project: Database Management Services

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Welcome to The Future of Analytics In Action

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

NoSQL Data Base Basics

NoSQL in der Cloud Why? Andreas Hartmann

Ganzheitliches Datenmanagement

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

GoodData. Platform Overview

Towards Smart and Intelligent SDN Controller

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

a division of Technical Overview Xenos Enterprise Server 2.0

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus sandhaus@cs.nyu.edu evan@nytimes.com

Information Retrieval Elasticsearch

Securing NoSQL Clusters

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Using IBM dashdb With IBM Embeddable Reporting Service

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Crazy NoSQL Data Integration with Pentaho

D5.3.2b Automatic Rigorous Testing Components

Search and Real-Time Analytics on Big Data

WELCOME TO The Future of Analytics in Action: The Art of the Possible

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

InfiniteGraph: The Distributed Graph Database

Monitoring Elastic Cloud Services

Functional Requirements for Digital Asset Management Project version /30/2006

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

MongoDB Developer and Administrator Certification Course Agenda

Structured Data Storage

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 30,

In Memory Accelerator for MongoDB

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

NoSQL document datastore as a backend of the visualization platform for ECM system

Building a Scalable News Feed Web Service in Clojure

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Transcription:

EIDA WFCatalog Service Luca Trani and the EIDA Team FDSN WGIII meeting, Prague, June 29. 2015

EIDA WFCatalog Service Provides a well defined API to query for seismic waveform metadata (including QC) Enables continuous waveforms discovery based on metadata => no unnecessary data downloads Preliminary analysis and processing of seismic waveforms moved to data centers => big advantage for users who can quickly browse data and related features More than just QC the waveform metadata catalog offers functionalities to discover the best waveform data matching a variety of criteria combining multiple quality metrics and metadata extracted from MSEED files Based on a set of clear and well defined quality metrics which together with the API have been discussed and agreed among the major data centers in Europe Example of query: http://orfeus-eu.org/ws/wfcatalog/query?starttime=2015-06-01&endtime=2015-06-02&sta=hgn&cha=bhz [{"net":"nl","sta":"hgn","cha":"bhz","loc":"02","sample_rate":40.0,"record_length": 512,"quality":"D","num_gaps":0,"num_overlaps":0,"gaps_len":0,"overlaps_len":0,"sample_max": 4319,"sample_min":-3115, "sample_rms":1039.382142, "sample_mean":2563.56, "sample_stdev":1016.847836, "ms_glitches":0,"ms_amplifier_saturation":0,"ms_digital_filter_charging":1,"ms_digitizer_clipping": 0,"ms_missing_padded_data":0,"ms_spikes":1,"ms_suspect_time_tag":0,"ms_telemetry_sync_error": 0,"ms_calibration_signal":0,"ms_event_begin":0,"ms_event_end":0,"ms_event_in_progress": 0,:"ms_timing_correction":0,"ms_clock_locked":0,"percent_availability":100.0, start_time":"2015-05-

WFCatalog QC metrics QC metrics presented and discussed in the FDSN WGII meeting The selection of metrics and their definitions adopted in the WFCatalog result from the integration and harmonisation of: - metrics produced within the NERA EC project and discussed in EIDA with broad consensus among the major european data centers - metrics currently present in MUSTANG num_gaps num_overlaps sample_max sample_rms Metric Name Description Duration of gaps Duration of overlaps Maximum sample value RMS sample value EIDA provides a complete package to extract and manage QC and wf metadata including: formal definitions and service specification software stack for computation of metrics framework for wf metadata management: including repository and web service layer.

WFCatalog specification The specification of the WFCatalog is the result of a joint collaboration among EIDA partners The design has been guided by the following principles: Maximise compliance with existing FDSN standard services like dataselect Facilitate compliance/interoperability with existing systems like MUSTANG Provide flexibility to address different use cases Enable future extensions e.g: number of metrics and different granularities Enable integration within the broader EIDA NG architecture

WFCatalog prototype architecture Prototype service currently in operation/testing at different datacenters - scalable and flexible architecture based on a few key components and selected technologies - big data ready @ODC Metadata repository serving both WFCatalog and FDSN Dataselect => filtering extensions out of the box

WFCatalog: metadata repository Technological investigation started in 2012 Experiments carried on with different DB technologies: MonetDB, MySql, NoSQL (Cassandra, CouchDB, MongoDB) Main requirements: performance, scalability, efficient big data handling, flexibility and extensibility Chosen technology addresses efficiently the desired requirements : flexible, scalable, robust and highly performant also in single node mode. Other features: - efficient caching system - schema-less => collections can be dynamically modified - multi language support - built-in json API - powerful indexing features - support for high availability and clustering - active community and support