1 Astron. Nachr. / AN 000, No. 00, 1 6 (0000) / DOI please set DOI! esdo Algorithms, Data Centre and Visualization Tools E. Auden 1, T. Toutain 2, and S. Zharkov 3 1 Mullard Space Science Laboratory, University College London, UK 2 Physics Department, University of Birmingham, UK 3 Department of Applied Mathematics, University of Sheffield, UK The dates of receipt and acceptance should be inserted later The esdo project is a UK e-science project funded by PPARC to develop solar algorithms, visualization tools and designs for a UK data centre that can be accessed through the UK virtual observatory in preparation for the Solar Dynamic Observatory (SDO) mission in Algorithms available for use by the solar community include helioseismology applications, coronal feature recognition, and wave power analysis. Visualization tools will allow users to vary time ranges, cadence and resolution and they view streams of images data from the Atmospheric Imaging Assembly (AIA) and Helioseismic Magnetic Imager (HMI) instruments. Finally, a prototype UK data centre will demonstrate efficient UK user access to SDO data through both AstroGrid searches and integration with the global SDO data archive. 1 Introduction The Solar Dynamics Observatory (SDO) mission will be launched in September 2008 with three instruments onboard: the Atmospheric Imaging Assembly (AIA), the Helioseismic Magnetic Imager (HMI), and the Extreme Ultraviolet Variability Experiment (EVE). The AIA and HMI instruments will together produce one 4096 by 4096 pixel filtergram every 2 seconds; this cadence will result in approximately 2 TB of new data every day. Because of the SDO mission s large data volume, users will require sophisticated searching techniques and visualization tools that can help them efficiently locate scientifically interesting data. The esdo project has been funded by PPARC for three years to integrate SDO data and resources with the UK virtual observatory (VO) through AstroGrid 1 VO infrastructure. The three primary objectives of esdo are to develop solar algorithms for community use, to design a UK data centre that will optimize data access speed for UK users, and to create visualization tools that will expedite SDO data searches. The esdo consortium is led by the Mullard Space Science Laboratory (MSSL), and three other UK institutions are involved: the Rutherford Appleton Laboratory and the Universities of Birmingham and Sheffield. 2 Algorithms 2.1 Virtual Observatory Deployment Nine esdo algorithms are under development by the esdo team: Coronal loop recognition Non-Linear Force Free magnetic field extrapolation 1 Helicity computation Mode parameters calculation Subsurface flow analysis Perturbation map generation Local helioseismology inversion Coronal dimming region recognition Small events detection While these algorithms will continue to be developed until project completion in September 2007, initial prototypes have been deployed through the AstroGrid Common Execution Architecture (CEA) 2 system and can be accessed through the AstroGrid Workbench 3. Instructions for executing these algorithms are available on the esdo Try Our Software website 4. Although the algorithms will be tailored for SDO data closer to launch, members of the solar community can use data from TRACE, SOHO, and, after February 2007, Hinode as input to the algorithms as of October In addition to the nine algorithms listed, the esdo project is also working with solar physicists at the University of Warwick to deploy a coronal wave power analysis tool as an AstroGrid CEA application and to integrate this algorithm into the SDO streaming tool described in section 3.2. The AstroGrid CEA module exposes remote applications as web services. CEA is suitable for applications with commandline or HTTP GET / POST interfaces. There are two caveats: applications must not have graphical user interfaces, and users cannot interact with the applications once they have been launched. CEA applications are executed on a remote server, and they can be accessed individually through the AstroGrid Workbench Task Launcher or built
2 2 E. Auden, T. Toutain & S. Zharkov: esdo into executable workflows through the Workbench Workflow Builder. The esdo algorithms deployed as CEA applications can incorporate variables (such as integers, floats, or strings) as well as files (from a local system, URLs, or AstroGrid s remote MySpace storage area) as input and outputs. In addition to AstroGrid CEA applications, esdo algorithms will also be made available to the solar community as C code modules in the Joint Science Operations Center 5 (JSOC) processing pipeline. The C modules will also be wrapped in IDL for distribution through the MSSL SolarSoft gateway. 2.2 Global Helioseismology Algorithms esdo consortium members at the University of Birmingham are focusing on the calculation of global helioseismology mode parameters from HMI dopplergrams. It is assumed that the dopplergrams will be pre-processed inside the JSOC pipeline to produce time series for low-degree p- modes, and then the esdo mode parameters algorithm will calculate the the response function and central frequency, linewidth, and power amplitude of specific target modes. The generation of time series from HMI dopplergrams uses the optimal mask technique developed by Toutain and Kosovichev to reduce the leakage of other modes into the target mode s signal (Toutain and Kosovichev, 2000). First, each dopplergram in the input series is binned. The target mode s signal is modelled and averaged within each dopplergram bin, and then an optimal mask is applied to the binned dopplergram that minimizes signal contributions from modes near the target mode. All dopplergrams in the input series can be cleaned of mode leakage in this manner to generate a time series of the target modes signals. The application of a discrete Fourier transform to this time series will produce the target mode s power spectrum. Once a target mode s times series and power spectrum have been produced, the esdo algorithm can calculate the mode s frequency, linewidth and power amplitude parameters. First, the static and dynamic components of the target mode s response function, or leakage matrix, are computed with the adaptive response function method developed by Vorontsov and Jefferies (Vorontsov and Jefferies, 2005). Next, the total leakage matrix is used in conjunction with a table of p-mode frequencies and the target mode s degree, azimuthal order, and order to determine the optimal spherical harmonics coefficients. 42 GB of cleaned MDI frequencies have been prepared by T. Toutain for use as test data to validate the global mode parameters algorithm, and these datasets are available on the esdo server at MSSL. In addition, Toutain is developing a set of artificial medium-l time series to investigate the algorithm s utility wih eedium-l modes. These datasets can be used with the current AstroGrid CEA deployment of 5 the algorithm to test the calculation of leakage matrix coefficients and optimal spherical harmonics parameters. Once the esdo mode parameters code has been slotted into the JSOC pipeline, Toutain s test datasets can also be used for comparison with the time series generated by the pipeline s dopplergram preprocessing. 2.3 Local Helioseismology Algorithms University of Sheffield esdo consortium members are developing three local helioseismology algorithms. HMI three-dimensional tracked Dopplergram datacubes will be used as inputs to two applications that produce either wave speed perturbation maps with in-and-out travel times or subsurface flow maps with skip distance travel times. The travel times from both algorithms can be used as input to a third algorithm, local helioseismology inversion. Both the wave speed perturbation and subsurface flow map generation algorithms require preprocessing of the input tracked dopplergrams. The user may apply one or more filters: a difference filter, low pass filter, or amplitude modulation correction may be applied to the data. Then the filtered dopplergram data is processed with a cross-correlation function, phase speed filtering is applied, and the user may specify the averaging scheme to use, either point-to-annulus or point-to-quadrant. Next, travel times can be extacted; if the user has specified the Gizon-Birch method, then the travel time perturbation difference is calculated for a set of skip distances (Gizon and Birch, 2005). If the user has specified the Gabor wavelet fitting method, the travel time perturbation mean is computed for the skip distances (Kosovichev and Duval, 1997). Both algorithms read skip distances from a configuration file supplied with the code, but a production version of the algorithm will allow users to supply their own skip distances. The subsurface flow application uses Gabor wavelet fitting and generates acoustic wave bulk subsurface flow travel times between any two points on the solar surface. The wave speed perturbation application calls the Gizon-Birch fitting method and produces the subsurface wavespeed travel times between any two solar surface locations. The local helioseismology inversion algorithm allows the user to predict the behaviour of one or more internal solar parameters by constraining other parameters during an inversion of travel time data. The esdo local helioseismology algorithm accepts as input travel times (either calculated by the subsurface flow and wave speed perturbation algorithms described above or through other means) and sensitivity kernels supplied by the user. The user may also specify trade-off parameters for multi-channel deconvolution and horizontal regularisation. The resulting output is the three-dimensional inversion of either acoustic wave speed perturbation or subsurface flow data. Three dimensional tracked dopplergrams from the SOHO-MDI instrument are used as test data for the subsurc 0000 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
3 Astron. Nachr. / AN (0000) 3 face flow and perturbation map generation algorithms. Following SDO launch, tracked 3-D HMI dopplergrams will be used as input to the AstroGrid CEA, JSOC pipeline and SolarSoft deployments of these two applications. The local helioseismology algorithm can currently be tested with travel times generated from application of the Gizon-Birch or Gabor wavelet fitting methods to tracked 3-D MDI dopplergrams, but it will also use HMI data as input to the production AstroGrid, JSOC and SolarSoft distributions. The local helioseismology inversion application would benefit from a community repository of sensitivity kernels. 3 Visualization Tools Visualization tools are GUI applications that allow scientists to view large volumes of data rapidly. Their purpose is to reduce the time spent downloading, processing and manually evaluating full resolution datasets in the search for specific regions or timeranges of interest. Scientists can watch movies, skim tables of thumbnail images, query catalogues, and zoom in to look at features visible in high resolution images in order to pinpoint specific datasets for further processing. Web browser accessible archives of readymade thumbnail images and movies have benefited users of SOHO instruments such as MDI. With a high cadence mission like SDO, availability of searchable low resolution images and movies will minimize the network traffic placed on the SDO data centre. The primary SDO data archive will produce ready-made thumbnail images and movies for AIA and HMI regions of interest. Three visualization tools are under development by the esdo project that will allow users to generate their own region of interest images and movies: an image gallery tool, the SDO streaming tool, and a movie maker. All three tools are designed to display data located through Astro- Grid client-side search mechanisms such as HelioScope 6. HelioScope displays all URLs of datasets matching the time range and instrument request. Once a user has identified one or more datasets to view, the matching URLs are relayed from HelioScope to the visualization tool via the PLASTIC 7 ivo://votech.org/fits/image/loadfromurl message. 3.1 Image Gallery Tool The image gallery tool displays thumbnail images and basic metadata for all AIA images, HMI line-of-sight magnetograms, or HMI dopplergrams within a given start and stop time range. Currently, this tool is implemented as a client-side java webstart application that a user can launch from the esdo website. Once the image gallery tool has been started, a user can select the start time, stop time and instrument through the AstroGrid HelioScope application. The image gallery tool downloads all highlighted datasets to Fig. 1 Screenshot of the SDO streaming tool prototype displaying a coronograph from the SOHO-LASCO instrument. the user s machine and displays a table containing a 256 by 256 pixel thumbnail image along with observation date, filename, instrument name, and wavelength (or magnetic field where applicable) for each dataset. 3.2 SDO Streaming Tool The SDO streaming tool also works in conjunction with HelioScope to display images and basic metadata on a user s machine. However, the streaming tool displays one image in a 512 by 512 display panel, and a user can zoom in or out through four levels of resolution for a full resolution 4096 by 4096 pixel AIA or HMI image. In addition, the user can pan spatially around the image in eight directions. Movie mode functionality is currently under development; after specifying a start and stop time range, the user will be able to step forward, step backwards, pause, play forward, or rewind all images within that time range. Like the image gallery tool, the streaming tool prototype is a client-side java webstart application that downloads datasets to a user s machine before displaying the associated images. However, the production version will be a server-side application, accessible through a web browser, that will cache images in a remote filestore and stream them to the user s machine. Not only will this provide a smoother transition between images played in movie mode, it will also reduce the amount of memory utilized on the user s machine.
4 4 E. Auden, T. Toutain & S. Zharkov: esdo 3.3 Movie Maker The third esdo visualization tool under development is a movie maker tool. Like the image gallery and the SDO streaming tools, the input to this application is a set of URLs matching start and stop time ranges located by the Astro- Grid HelioScope tool. Images are extracted from the remote datasets and saved as animated GIFs or MPEGs that the user can then view, save and add to presentations or websites. Although region-of-interest movies will be generated by the JSOC pipeline, the esdo movie maker tool will allow users to create movies tailored to time ranges or regions of interest pertinent to their own research interests. The movie maker tool is being developed as a serverside application rather than a client-side java webstart application. Once the movie maker runs successfully as a serverside application, the image gallery tool and SDO streaming tool will also be migrated to server-side execution. A client-side application s code runs on the user s local machine, whereas a server-side application is executed on a remote machine. There are three advantages to server-side visualization tools compared to client-side tools. First, intensive image processing routines can take advantage of a more powerful remote computer. Second, datasets are downloaded to the remote machine on high speed academic networks such as the UK s SUPERJANET backbone rather than a user s local ethernet or wireless connection. Third, images can be cached in a central repository for reuse by other users rather than initiating a new download request and image processing execution each time an image is viewed. Future work on the esdo visualization tools will focus on the migration of all three tools to a single server-side application that can be accessed through a web browser. Developers on the esdo and AstroGrid projects are working on a java library that will allow client-side applications such as the AstroGrid HelioScope tool to send PLASTIC messages to the server-side esdo visualization package and other remote web browser accessible applications. In addition, functionality from existing algorithms, such as coronal loop recognition and coronal wave power analysis, will be added to the streaming tool. 4 UK Data Centre The esdo data centre workpackage will deliver a working prototype and production level design for a UK partial mirror of SDO data that can be accessed through AstroGrid data searches. AstroGrid users can locate data in two ways, using either the HelioScope tool or building a workflow that queries a DataSet Access (DSA) module AstroGrid Data Access and Provision The HelioScope tool is a client-side application that can be accessed through the AstroGrid WorkBench. Helioscope allows users to perform queries based on start time, stop time, and dataset type (graphical image or time series) using the Simple Time Access Protocol (STAP) (Dalla and Benson, 2006). HelioScope locates all solar and solar terrestrial instrument archives with live STAP services. STAP services that host datasets conforming to the user s input time range and data type request are displayed to the user as instrument names. The user can click each instrument name to see matching datasets timestamps, URLs and basic metadata.finally, the user can highlight one or more datasets to download to a local machine, upload to virtual storage space in the AstroGrid MySpace 9, or send to another client-side application for further processing. To make a data archive available through the HelioScope tool, the data provider can deploy a web service conforming to the STAP schema. The STAP service must respond to start and stop time queries with URLs pointing to individual instrument files. Web service implementation is left to the user, although deployment through Axis 10 with the Tomcat application server 11 is recommended. The DSA module permits more detailed queries of data archives. To make data available through DSA, data providers should store metadata describing each dataset (for instance, FITS header keyword / value pairs) in a relational database table. The DSA module can then configured for read-only access to that database table before deployment through the Tomcat application server. Users submit Astronomical Data Query Language (ADQL) queries (Ohishi and Szalay, 2005) to the DSA module, and query responses may be formatted as HTML, plain text or XML files formatted as VOTables (Ochsenbein et al, 2004). In the case of remote files such as FITS files, images, or text files, the table can include URLs pointing to each dataset. Users can retrieve files by building a workflow that first queries the DSA for a VOTable of URLs and then processes the VOTable to download URLs locally, upload them to MySpace, or send them to another application. For data archives in the form of catalogues, such as event or object lists, the combination of DSA and relational database is sufficient to generate a VOTable containing the subset of catalogue entries corresponding to a user s request. 4.2 JSOC DRMS Software The JSOC data centre software includes a Data Resource Management System (DRMS) and a Storage Unit Management System (SUMS). SDO metadata is stored in instances of DRMS with relational databases, such as PostgreSQL
5 Astron. Nachr. / AN (0000) 5 or Oracle 13, as the backend. Datasets themselves are held as file units in instances of SUMS deployed with file systems. All instances of DRMS hold metadata for the global SDO data archive along with pointers to files in specific SUMS. An instance of SUMS can hold a full copy or a fraction of the SDO archive. Both a STAP web service and a DSA module will be hosted in the UK and integrated with a full deployment of DRMS and a small SUMS. 4.3 UK SDO Data Centre Design During the Phase A research period of the esdo project, a data centre design and prototype were developed assuming that a partial mirror of SDO data would be held on tape at the Atlas Data Storage 14 (ADS) facility at the Rutherford Appleton Laboratory. The Phase A data centre design included 30 TB of tape storage connected with a jukebox machine capable of caching up to 1 TB of data. The 30 TB of tape storage would be divided between 15 TB holding a rolling cache of the most recent 60 days worth of AIA and HMI data, whie the remaining 15 TB would act as a true cache for popular datasets older than 60 days. A Phase A data centre prototype demonstrated interoperation between the ADS facility and an AstroGrid DSA module hosted at MSSL. 10 MB of test data from the TRACE instrument were uploaded to tape storage at ADS. Next, a servlet was installed on a RAL jukebox that exposed each test file in the tape store as a URL. When a file was requested, the servlet would either return the file from cache or, if necessary, execute the Atlas tape command to retrieve the file from tape storage before passing it to the user. FITS header keyword pairs were extracted from each test file and stored in a MySQL database table at MSSL; this table was configured for read-only access by an Astro- Grid DSA module, also hosted at MSSL. The final prototype allowed a user to execute an AstroGrid workflow that first submitted an ADQL query to the DSA, then returned a VOTable of URLs pointing to test files in the ADS tape storage, and finally retrieved the files to AstroGrid MySpace. Network considerations have prompted a redesign of the UK esdo data centre. Network tests performed in April 2006 between Stanford University in the US and UCL, MSSL and RAL in the UK demonstrated that files were transferred approximately 3 times as quickly between UK institutions on the SUPERJANET backbone compared to files transferred over the transatlantic network (Auden, 2006). These tests concluded that UK users would experience faster download speeds retrieving data from an archive on the SUPERJANET backbone than they would downloading the same data from Stanford. Download speeds would also be positively affected during periods of high network traffic to the SDO data centre directly following solar events such as flares and coronal mass ejections. However, the ADS tape facility requires a one minute overhead for every new tape accessed in the robotic tape facility. Any network gains made for UK users by archiving SDO data in the ADS tape facility would be reduced by the tape access overheads. In addition, the constant data requests required to update the UK data centres rolling 60 day cache of AIA and HMI datasets would increase the level of background network traffic to the US data centre. The most efficient network and storage design for the UK data centre is therefore to designate a hard disk cache that can act as a proxy server for SDO data requests initiated within the UK. The first time a dataset is requested from the UK, either through a HelioScope STAP service request or through a workflow DSA query, the data will be transferred from the US data centre and cached in the UK data centre as it is passed to the user. The proxy server will behave as a true cache ; only datasets of interest to users will be stored in the UK, and datasets will only be stored as long as they remain popular. This design can efficiently use a smaller volume of storage; not only will 5TB of hard disk be sufficient to cache HelioScope and DSA data requests, but this space can also be used to cache processed images and movies used by the esdo visualization tools. 5 Conclusions The HELAS network will facilitate the sharing of helioseismology data, algorithms and expertise between European researchers. The esdo project can contribute global and local helioseismology algorithms, visualization tools, and designs for a data cache integrated with both the JSOC global archive and the AstroGrid virtual observatory. Solar algorithms developed by the esdo project will be deployed as AstroGrid CEA applications and JSOC pipeline modules. The open source C code will also be wrapped in IDL for distribution as SolarSoft routines. In addition to coronal and magnetic algorithms, the esdo team will also deliver applications to calculate global mode parameters from HMI time series as well as local helioseismology inversions, subsurface flow analysis, and wave speed perturbation map generation. SDO data will be available from the US archive in California, but data cached in a UK proxy server will not only increase data retrieval speeds during high traffic periods, but it will also expedite data image display for esdo visualization tools such as the streaming tool. The esdo project is currently building a reference implementation of a remote DRMS and partial SUMS that will be searchable through both the JSOC global archive and AstroGrid search mechanisms such as AstroScope and DSA. Finally, the esdo visualization tools - the streaming tool, image gallery and movie maker - will help researchers visually identify datasets relevant to the time ranges or phenomena of interest to individual scientists. All three tools will be integrated with AstroGrid HelioScope so that
6 6 E. Auden, T. Toutain & S. Zharkov: esdo SDO data located with the virtual observatory can be easily viewed with the esdo tools, and any images, movies or science products generated with these visualization tools can be saved in VO virtual storage or further analysed with other VO applications. References Auden, E.:2005, esdo Network Latency Tests, SDO/NetworkLatencyTests Dalla, S., Benson, K.:2006, Proposal for a Simple Time Access Protocol, V0.1, Astrogrid/SimpleTimeAccessProtocol Gizon, L, Birch, A.C.: 2005, in textitliving Reviews of Solar Physics, 2, 6 Kosovichev, A.G., Duvall, T.L., Jr.: 1997, in Pijpers, F.P., Christensen-Dalsgaard, J., Rosenthal, C.S(eds), Astrophysics and Space Science Library, 225 Ochsenbein, F., Williams, R., Davenhall, C., Durand, D., Fernique, P., Giaretta, D., Hanisch, R., McGlynn, T., Szalay, A., Taylor, M., Wicenec, A.: 2004, VOTable Formation Definition Version 1.1, IVOA Recommendation latest/vot.html Ohishi, M., Szalay, A.: 2005, IVOA Astronomical Data Query Language Version 1.01, IVOA Working Draft, latest/adql.html Toutain, T., Kosovichev, A.G.: 2000, ApJ, 534, 2 Vorontsov, S.V., Jefferies, S.M.: 2005, ApJ 623, 1