THE IRIS DATA MANAGEMENT CENTER



Similar documents
Seismic waveform data retrieval

An Overview of the National Earthquake Information Center Acquisition Software System, Edge/Continuous Waveform Buffer

DESCW: PC Software Supporting Remote Sensing Data

Retrieving data from IRIS/USGS stations

Name: Date: Class: Finding Epicenters and Measuring Magnitudes Worksheet

Interfacing SAS Software, Excel, and the Intranet without SAS/Intrnet TM Software or SAS Software for the Personal Computer

Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation. Report

Chapter 2 Data Storage

BENEFITS OF AUTOMATING DATA WAREHOUSING

Geophysical observation network of JAMSTEC in Northwest Pacific Region

XenData Video Edition. Product Brief:

SAS, Excel, and the Intranet

Monitoring Replication

SMIP02 Seminar Proceedings

The Einstein Depot server

SEED. Reference Manual. Standard for the Exchange of Earthquake Data. SEED Format Version 2.4 August, 2012

Nearly real-time monitoring system of TABOO seismic network activity.

PIONEER RESEARCH & DEVELOPMENT GROUP

Software Development at the LMU - Munich. Ideas, Vision, Commitment?

Top 10 Things to Know about WRDS

Product Review ControlUp

Global Animation Industry: Strategies Trends & Opportunities

2.2 INFORMATION SERVICES Documentation of computer services, computer system management, and computer network management.

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Broadcast Automation Without Boundaries

Locating the Epicenter and Determining the Magnitude of an Earthquake

Radiological Assessment Display and Control System

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

FOURTH GRADE EARTHQUAKES 1 WEEK LESSON PLANS AND ACTIVITIES

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Using weblock s Servlet Filters for Application-Level Security

How To Use Hadoop For Gis

Remote Backup Solution: Frequently Asked Questions

MCAPS 3000 DISASTER RECOVERY GUIDE

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Call Recorder Quick CD Access System

Fundamentals of UNIX Lab Networking Commands (Estimated time: 45 min.)

Original-page small file oriented EXT3 file storage system

SCADA Questions and Answers

DSI File Server Client Documentation

QAD Enterprise Applications. Training Guide Demand Management 6.1 Technical Training

2015 Survey Summary for Storage in Professional Media and Entertainment

Earthquake Magnitude Calculator for the AS-1 Seismograph 1

The Microsoft Large Mailbox Vision

AVALANCHE MC 5.3 AND DATABASE MANAGEMENT SYSTEMS

Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine

An Introduction to LoadRunner A Powerful Performance Testing Tool by HP. An Introduction to LoadRunner. A Powerful Performance Testing Tool by HP

EMC Backup and Recovery for Microsoft SQL Server

Archiving File Data with Snap Enterprise Data Replicator (Snap EDR): Technical Overview

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

User Guide. SysMan Utilities. By Sysgem AG

Best practices for operational excellence (SharePoint Server 2010)

DiskPulse DISK CHANGE MONITOR

Noise Monitoring Software, version 7.0 Types 7802 and 7840

Backup Exec Private Cloud Services. Planning and Deployment Guide

How To Backup An Rssql Database With A Backup And Maintenance Wizard

earthnet online The ESA Earth Observation Multi-Mission User Information Services

Greenplum Database (software-only environments): Greenplum Database (4.0 and higher supported, or higher recommended)

DIGITAL UNIVERSE UNIVERSE

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

PART 1. Representations of atmospheric phenomena

Get Success in Passing Your Certification Exam at first attempt!

EVOLUTION AND INDUSTRIALIZATION OF A SBAS REAL-TIME PERFORMANCE MONITORING TOOL (EVORA)

Strategic Plan:

Audit TM. The Security Auditing Component of. Out-of-the-Box

Remote login (Telnet):

Integrated and reliable the heart of your iseries system. i5/os the next generation iseries operating system

Jetico Central Manager. Administrator Guide

Seismic Networks in Canada

Copyright 2012 Trend Micro Incorporated. All rights reserved.

Monitoring Extended Server Environments with GSX Monitor V9 Gain a unified view of your Domino, Sametime, Exchange, and Blackberry Servers

Protecting your SQL database with Hybrid Cloud Backup and Recovery. Session Code CL02

Near-Instant Oracle Cloning with Syncsort AdvancedClient Technologies White Paper

Global Amazon Integration Module v1.1. Users Guide & Setup Instructions

Yiwo Tech Development Co., Ltd. EaseUS Todo Backup. Reliable Backup & Recovery Solution. EaseUS Todo Backup Solution Guide. All Rights Reserved Page 1

Many DBA s are being required to support multiple DBMS s on multiple platforms. Many IT shops today are running a combination of Oracle and DB2 which

Solution Brief: Creating Avid Project Archives

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

2- Electronic Mail (SMTP), File Transfer (FTP), & Remote Logging (TELNET)

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Virtualization s Evolution

siemens.com/mobility Sitraffic Office The integrated workstation for traffic engineers

Business Process. Automation. Automation. David Chernicoff Susan Perschke. sponsored by

Transcription:

] E C T R 0 N I ( then such access tools, by any name, will be increasingly important to assist us in getting to the data we need. One important service of the IRIS DMC not covered in this current article is the near real-time waveform acquisition system called SPYDER. Since the Electronic Seismologist was the original developer of the SPYDER system, he will cover it in I S M 0 L 0 G l S ~I detail in a later column. Steve Malone E-mail: steve@geophys.washington.edu Geophyics AK-50 Phone: (206) 665-3811 University of Washington Fax: (206) 543-0489 Seattle, WA 98195 Foreword During the past year the Electronic Seismologist column focused on various aspects of the Internet from a seismologist's perspective. During the coming year this column will cover in some detail several specific sources of seismic data and information. Major data centers will be the topic of two columns, and special data access techniques such as automatic data request managers (AutoDRM) will also be covered. Guest authors for this column are encouraged. If anyone has a short article which might be appropriate for this column please contact Steve Malone (e-mail: steve@geophys.washington.edu). In this issue guest author for the Electronic Seismologist is Dr. Tim Ahem, the Program Manager for the Incorporated Research Institutions for Seismology (IRIS) Data Management System (DMS). Many people's main contact with IRIS is through the IRIS Data Management Center (DMC), the heart of the DMS. Located in Seattle, Washington, the DMC is responsible for archiving and distributing the products of the other major programs of IRIS, the Global Seismic Network (GSN) and the Program for Array Seismic Studies of the Continental Lithosphere (PASSCAL). Other parts of the Data Management System include data collection centers, data quality control groups, and analysis and development groups. Details of the IRIS organization and activities can be electronically accessed through the kyr at URL http://www.iris.edu. For many newcomers to the IRIS DMC, sorting through the wealth of information and data stored there can be initially confusing. The following article gives a nice overview of how the IRIS DMC has developed, the nature of the data stored there, and most importantly, a comprehensive summary of many of the tools to assist the user in finding and obtaining the data or information of interest. While the Electronic Seismologist appreciates the easy-to-remember agrarian-based names of many of the data access programs and facilities of the DMC, he does not necessarily approve of the specific acronyms they are supposedly based on. However, if the exponential trend for data volume shown in some of the following figures is an accurate forecast of the future, THE IRIS DATA MANAGEMENT CENTER Tim Ahern IRIS DMS Program Manager The IRIS DMC's principal task is to archive data from the IRIS GSN and PASSCAL programs and to distribute these data to researchers when requested. Nevertheless, the IRIS DMC also acts as a central archive and distribution point for data from a variety of other networks. One of the most important data sets is from the Federation of Digital Seismographic Networks (FDSN). In its role as the FDSN Data Center for Continuous Data, the IRIS DMC routinely receives data from most members of the FDSN including Canada, China, Czech Republic, France, Germany, Italy, Japan, and Russia, as well as data from several arrays and networks operated by IRIS in the Former Soviet Union. Some historical data from the Iranian Long Period Array (ILPA), the Alaskan Long Period Array (ALPA), and the Large Aperture Seismic Array (LASA) are also archived. The DMC acts as the archive and distribution point for data from the Southern California TERRAscope network as well as the primarily short-period data from the Pacific Northwest Seismic Network. IRIS is now a member of the Council of the National Seismic System (CNSS), which coordinates activities of most regional networks within the United States as well as the US National Seismic Network. Plans are now in place to archive selected data from the Northern California Network, the ANZA array, and it is likely that data from most other members of the CNSS will either be made available through the IRIS DMC or the Northern California Earthquake Data Center (NCEDC) at UC Berkeley. The IRIS DMC implemented a new method of parallel archiving during the past year, which allowed us to significantly increase the rate at which data can be archived as well as increasing the number of independent data sources that can be archived simultaneously. Figure 1 shows the amount of data from permanent seismic recording stations now in the IRIS DMC archive. This graph does not include data from temporary deployments such as PASSCAL. Most of the data in the archive have been compressed where one byte generally contains one sample. Data are stored in two different sorted orders in the archive, once by time and once by station. Two features are very noticeable in Figure 1. Although the figure groups data into the four primary types of data sources, the number of networks archived actually increased 30 Seismological Research Letters Volume 67, Number 3 May/June 1996

2500 2000 1500 1000 500 IRIS DMC ARCHIVE GROWTH r OJ ~ 03 O~ CO '~" ~1" ~1" Lib ~ lar3 03 03 O~ 03 03 03 03 03 03 03 03 03 03 r rid CO ~ I'~ I~ 0 0 GO 9 Figure 1. This figure shows the growth in the IRIS DMC since it moved to Seattle in 1992. Although data from more than two dozen networks are archived at the DMC, this diagram groups them into four fundamental types of seismic sources for clarity of presentation. Prior to 1995, the rate of growth of the archive was roughly 330 gigabytes per year. During 1995 it increased to about 1500 gigabytes (1.5 terabytes) per year. from four to twenty-four during 1995. The other noticeable feature is that the rate at which data are being archived has increased by roughly a factor of 4.5 during 1995. Most of this increase comes from increased data flow from the FDSN and from several array components of IRIS. In addition to the fully managed data now totaling more than two terabytes, the DMC also has data from a variety of other sources (primarily from PASSCAL experiments) that are maintained and distributed as assembled data sets. As of January, 1996, the PASSCAL program had contributed a total of 34 assembled data sets with a total volume of 91 gigabytes. Meeting the Data Needs of the Seismological Research Community Although a fundamental goal of the IRIS DMC in Seattle is to insure the long-term viability of the data archive, the IRIS DMC has become one of the major sources of seismological data for the United States as well as the international seismological research community. For the past several years the IRIS DMC has distributed a greater volume of data to the seismological community than the IRIS GSN has generated. Perhaps this fact, more than any other, provides testimony to the active use of the IRIS DMS. In the initial planning stages of the IRIS DMS, it was projected that the IRIS DMC would service approximately 200 requests for GSN data per year. Figure 2 shows that the IRIS DMS has exceeded original expectations by more than two orders of magnitude; over 37,500 data shipments were made in 1995. Figure 2 also shows how the data shipments E._=r "8 E z 40000 35000 30000 25000 20000 15000 10000 5000 0 [] Documents 9 Programs [] FARM DC '[] Farm [] Assembled 9 Customized O IRIS DMC Shipments 0 OJ 03 ~ LO Year 9 Figure 2. The most dramatic measure of the success of the IRIS DMC is the number of requests for data, programs and documentation that are serviced per year. This diagram reflects that growth. Customized requests are specific requests for selected portions of the archive, Assembled are items distributed as complete data sets, the FARM is the on-line collection of SEED volumes containing data from the largest events, FARM DC represent shipments of FARM volumes that went to data centers around the world and not to scientists. have continued to increase in a nearly exponential manner. The diagram includes all customized data and all assembled data products Shipped. There are a variety of different ways to measure the output of the IRIS DMS. Another clear estimate of its everincreasing output is the number of individual seismograms shipped. In 1995, IRIS DMC shipped nearly 20 million seismograms from only our two most active data distribution mechanisms, namely customized data requests and FARM products. Although 1989 and 1990 were extremely busy years for the IRIS DMC (more than 100,000 seismograms were shipped in 1990), the more recent years completely dwarf the early years of the DMC. User Access Tools One of the principal goals of the IRIS DMS is to provide easy data access to the worldwide seismological community. For this reason the IRIS DMS has developed a variety of tools to simplify making requests for waveform and parametric data. Figure 3 identifies the major data access methods developed at the IRIS DMC to gain access to its data holdings. Any of these methods allow a user to make a customized request for specific samples of the archive. DIRTS. The heart of the IRIS DMC system is the DIRTS database management system. This is an IRISdeveloped Data Base Management System (DBMS) built upon the commercially available DBMS, db_vista by Raima Corporation. It is a network database that provides extremely high-speed access to any information contained in various independent databases. Presently we store informa- Seismological Research Letters Volume 67, Number 3 May/June 1996 31

FARM 9 Figure 3. This diagram shows most of the user access tools that exist at the DMC. In general each tool has its particular strengths and is best suited for specific types of requests. Of particular importance are the two new access tools, WEED and CROP, since they represent the interfaces that should exhibit the greatest growth in usage for the future. tion in databases that are divided by specific networks and years. The only exception to the segmentation by network and year is for the IRIS DBMS itself, which includes the IRIS/IDA (II) subnetwork, the IRIS/USGS (IU) subnetwork, the GDSN (AS, SR, DW, HG RS) network of the USGS and for historical reasons the TERRAscope network (TS). The databases themselves contain information about specific stations, the channels they record, the channel's response to ground motion in minute detail, and comments. Additionally each database contains information about seismic events, their locations, their times, magnitudes, and other event information. The IRIS DMC presently manages a total of 85 databases for the various networks and years for which it has seismological data. The seismograms themselves are stored in robotic systems capable of storing a total of ten terabytes (1013 bytes) of data. When users make requests via any of the access tools depicted in Figure 3, programs at the DMC extract information from the various databases, recover waveforms from the mass storage systems and combine them to produce SEED volumes containing the requested information. Although not specifically shown in Figure 3, the IRIS DMC Electronic Bulletin Board remains a frequently-used method to contact the IRIS DMC and to invoke several of these access tools. A large amount of information about stations, events, hypocenter searching, on-line manuals, and access to the methods of making customized requests for data can be found in the main menu of the bulletin board. Specific manuals for access tools can be recovered electronically in one of three ways: 1. Sending e-mail to info@dmc.ids.washington.edu with the subject line "manuals" and the text with the tool name (breq_fast, rumble, etc.) 2. Accessing the manuals through anonymous ftp on machine dmc.iris.washingt0n.edu in directory ~ftp/pub/ manuals 3. Using the kyjww Uniform Resource Locator for the IRIS DMC, http://www.iris.washington.edu BREQ, FAST. The most frequently used access tool is BREQ_FAST. It is an e-mail-based tool that allows users to specify stations, channels, and time windows in a file with a specific format. When completed the BREQ FAST request file can be sent to the DMC by e-mail. RUMBLE. Similar in use to BREQ_FAST is RUMBLE (Requests Users Make _By Listing Events). This tool is also e- mail based. The user constructs a file of another format that identifies events of interest. For instance, users can make requests for data from earthquakes from a given geographic area, various magnitude sizes, depths or times. Users can indicate a data preference for specific stations and channels or limit the data recovered to specific event-station distances or azimuths. It is an extremely powerful tool that users can master quickly. XRETRIEVE. For users who have high-speed Internet access to the IRIS DMC (basically users in North America, and at certain times of the day, other users) two X-kYc'indow based tools are available. XRETRIEVE is similar in function to BREQ_FAST, only users interact with a Graphical User Interface (GUI). Data requests for specific networks, sta- 32 Seismological Research Letters Volume 67, Number 3 May/June 1996

tions, channels, and time periods can be made by pointing and clicking. XTRACT. This is a very powerful X-Window based tool that gives users full access to all of the information in the DIRTS DBMS. It produces very complex X-Window displays and as such requires very good Internet connectivity. RUMBLE type requests can be made using XTRACT, or specific pieces of information from the DBMS can be extracted and printed or saved to a file. XTRACT is currently being changed to a client-server architecture. SPROUT. The IRIS DIRTS DBMS is a network database. Nevertheless, it provides a Structured Query Language (SQL) interface that is normally reserved for Relational Data Base Management Systems. Users who are familiar with or are interested in learning Structured Query Language can directly connect to the SPROUT system from within the IRIS DMC Electronic Bulletin Board. SOD. Standing Order for Data (SOD) is a relatively new method of making routine data requests. SOD allows a user, normally a station operator or a seismologist with a specific need for data from specific stations, to make requests for data that the DMC has not yet received. The SOD request is similar in function to the BREQ_FAST request file format with a few additions. As data that match a SOD request arrive at the DMC, copies of the waveforms are stored in mini-seed volumes. Periodically, as specified by the user, these mini-seed volumes are transferred electronically or by tape to the requester. WEED. Another new and powerful tool is WEED, Windows E_xtracted from E_vent Data, that in some ways is a seismological travel time calculator. WEED is driven by three files with specific formats. The STATION file contains information about seismic stations, their locations, the types of channels they record, etc. The WEED EVENT file contains information about seismic events, their locations, origin times, magnitudes, and Ftynn-Engdahl regions. The final WEED file is called the DATA WINDOW DEFINITION file. This file contains information about the time windows a user wishes to use for data extraction relative to modeled travel times. For instance, a user can specify time windows starting 60 seconds before the P phase and continuing for 5 minutes after the PKIKP phase. It is worthwhile to note that the STATION and EVENT files do not have to refer to stations and events managed at the IRIS DMC but could apply to a regional network, a PASSCAL experiment, or any other independent collection of instruments. The IRIS DMC maintains WEED station and event files in its anonymous ftp area for users wishing to make requests for data at the IRIS DMC or other data centers able to process the BREQ_FAST format. WEED is a program that runs on a seismologist's local workstation. GUI-based tools allow one to select specific stations and events based upon a large number of parameters including station-event relationships. From the input STATION, EVENT and DATA WINDOW DEFINITION files, WEED builds what is called a SUMMARYFILE that contains information about the var- ious stations, events and time windows of interest. WEED allows the user to hand-edit these files. By clicking on another button, a BREQ_FAST request is built and sent to the IRIS DMC or other data center. It is a new but extremely powerful tool. FARM. The growth in the number of data requests serviced in the early years was truly remarkable. To a large degree it provided the incentive to develop the Fast Archive Recovery Method (FARM) of data access. Understanding the pattern of data requests for many years allowed the IRIS DMS to identify what data were of most interest to seismological researchers. Based on this understanding, the IRIS DMC routinely constructs data volumes in SEED format for all earthquakes larger than magnitude M w = 5.7 (M w = 5.5 for events deeper than 100km) and places them on-line in an anonymous ftp area. These volumes can then be accessed without having to interact with the IRIS DMC staffat anonymous FTP machine: dmc.~ds.washington.edu in directory: pub/farm and then in specific subdirectories broken down by year and event time. Users can also directly access the FARM products using the World Wide Web and URL = http://www.iris.washington.edu/farm/farm.html Data windows are generally at least one hour for even the broadband channels, and so there is a high probability that the desired data are in the FARM. Data shipment patterns indicate that this is the case. FARM products are stored in a RAID disk system at the DMC and therefore should have very high availability. Roughly 45 gigabytes of RAID disk space are dedicated to the FARM. Nevertheless some less frequently accessed FARM products must be migrated off the RAID system and must exist in DMC mass storage systems. Access tools still allow researchers to gain access to these off-line FARM products without the involvement of IRIS DMC staff. CROP. The only disadvantage of the FARM is that some volumes are extremely large. For instance, the 1994 Bolivian earthquake FARM product is more than 100 megabytes in size. It is therefore too large to transfer electronically to most institutions. For this reason the Customized Reduction Of Products (CROP) access tool was created. This tool allows individuals to extract subsets of the data in a given FARM volume, produce a smaller volume in SEED format, and then electronically transfer that volume. What's Next? With the foundation of the IRIS DMS well established, the DMS is turning its attention to a variety of new issues. During 1996 and 1997 IRIS should see progress in the following areas: 9 Providing access to a wider variety of seismological data, including perhaps GPS and Strong Motion Data 9 Progress in providing access to data from more regional networks 9 Improving data flow from PASSCAL experiments 9 Demonstrating the use of Inmarsat Satellites to access very remote GSN stations for the SPYDER system Seismological Research Letters Volume 67, Number 3 May/June 1996 33

9 Investigating routine application of a variety of procedures to data at the DMC 9 Expanding the IRIS outreach to the K-12 community 9 Developing software that can be used at distributed data centers for quality control and generation of SEED volumes 9 Promoting networked data centers 9 Developing software needed to handle the increasing volume of data How to Reach the IRIS DMC The IRIS Data Management System exists to support seismology within the United States and around the world. We encourage all seismologists to take advantage of the facilities we offer. We can be reached at IRIS DMC 1408 NE 45th Street, Suite 205 Seattle, Washington 98105, USA Our principal data access tools are the electronic bulletin board telnet dmc.iris.washington.edu userid: bulletin password: board To access the data access tools select the "r" option to request data from the main menu of the bulletin board. The tools available there include xretrieve, xtract, sprout, and CROP. SPYDERTM is available directly from the main menu of the bulletin board. Our World Wide Web Uniform Resource Locator (URL) is: http://www.iris.washington.edu E:~ IRIS DMC 1408 NE 45th Street, #201 Seattle, WA 98105 E-mail: tim @iris.washington.edu (T. A.) 34 Seismological Research Letters Volume 67, Number 3 May/June 1996