ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013
ASKAP Science Data Archive Talk outline Data flow in brief Some radio astronomy terms Archive overview ASKAP operations Science users and use cases Requirements summary
Murchison Radio Observatory Pawsey Centre Worldwide Central processor: Processes the raw visibilities and outputs science data products. Science Data Archive Facility: Data storage and access to science data products
ASKAP Central Processor Data from same set of visibilities can be passed through three pipelines for Transient, Continuum and Spectral Line imaging. In principle this allows up to three different experiments to be scheduled simultaneously.
latitude Digression: Some radio astronomy terms Visibilities Fourier transform Images Radio astronomy images are created using a Fourier Transform of the observed visibilities. A single image is a map of a region of sky. This may show the sky brightness or another measured parameter. An image cube is a set of images for one region covering a range of frequency (spectral) channels. longitude
OH spectral Line emission from an evolved star Continuum image cubes (small-n) have up to 300 frequency channels (300 x 1 MHz) Spectral line image cubes (large-n) up to 16,200 spectral channels Transient observations generate one image every 5 seconds.
ASKAP Data Processing levels
ASKAP Science Data Archive (ASDA) ASKAP responsible for storage and access to level 5 data products. Survey Science Project Teams responsible for level 6 data validation. Astronomers and their science teams responsible for generation of level 7 enhanced data products.
ASDA: Schematic Data Management Layer HSM Tape libraries 2 x 20 PB Disk storage > 4 PB Science Database Nodes Portal Nodes Database cache storage 24 TB Web server 1 Web Server 2 Web Server 3 Web Server 3 V O S E R V I C E S F I R E W A L L Storage ~ 50 TB International astronomy community
ASDA Data Flows Schematic
ASDA Components Hierarchical Storage Management System with: 2 x 20 PB Tape Libraries (one as backup of other) several PB disk storage Science Database Management System 4 science database nodes with ~ 50 TBytes of storage Web servers with ~ 24 TBytes for cached data storage Virtual Observatory interface to external users Workflow processes User support and training Access to High Performance Computing facilities
ASDA: Data Management Layer The HSM will control the transfer of data to and from storage media. An additional data management layer is needed for the science archive. This will include software for: Generation of science metadata and updates to the science database User registration and data access controls User services including web interfaces, search forms, provision of Virtual Observatory services. Workflow management of interactions between the archive, the Central Processor, HSM and users.
Science Data Access Access to files, tables and search results will be provided through Virtual Observatory services: Simple Image Access Protocol returns link to images/cubes identified for a given position. Can generate image cut-outs. Cone searches: Returns table results such as positions and fluxes for sources detected within an area around agiven position. Table Access Protocol: Allows for complex querying of tables. For example could return a list of detections for sources above a given flux density with negative spectral indices (slopes). Note: Radio Astronomy is less familiar with using VO services than optical astronomy. Full adoption of VO services will require a cultural shift for users.
ASDA Primary Data Products Product Calibrated continuum visibility data (stored as measurement sets ) Continuum (small-n) images and image cubes Spectral line (large-n) image cubes Postage stamp images (quick look images produced with fewer pixels) Continuum source detection tables Spectral line source detections Transient source detections Transient light curves Project Source Catalogues Data type File File File File Table Table Table Table Table
Examples: File-based Data Products Product Parameters used Data Size 12-hr calibrated continuum visibility data set Set of three continuum images (restored, residual and model) Set of four polarisation continuum image cubes Single spectral line image cube 36 beams 300 channels 4 polarisations 666 baselines 5s integration Imsize = 10,800 pix 1 pol 1 spectral channel 3 images per position 300 spectral channels 4 pols Imsize = 3,600 pix 16,200 channels 1 pol 2.2 TBytes 5.4 GBytes 556 GBytes 839 GBytes
ASKAP Operations Longest baseline is 6 km. 36 antennas gives 630 baselines
ASKAP Operations Operated as part of the Australia Telescope National Facility (ATNF) Current ATNF Facilities are: Parkes radio telescope, Australia Telescope Compact Array, Mopra radio telescope, Long Baseline Array National Facility time is allocated on basis of scientific merit and technical feasibility Data taken on ATNF facilities are owned by CSIRO All ASKAP observations taken remotely (usually from Marsfield SOC) Science data available to all astronomers, in most cases with NO proprietary period
ASKAP Archive Users Astronomy Group Number of individuals Member of Survey Science Project teams 350 Member of Guest Science Project team 400 General astronomy community 750+ CASS Astrophysics group 30 Supported by CASS Science Operations 30 ivec Operations ~15 Technical developers tbd
ASKAP Science Survey Projects (SSPs) Projects need at least 1500 hours of telescope time 10 SSPs were selected in 2009. Their science goals are drivers of ASKAP development. SSP teams will work with CASS staff on aspects of science and archive commissioning. Project teams are responsible for their data validation i.e. will assess whether the data taken are of science quality. For bad data, observations may be retaken.
ASKAP science: EMU (C) Evolutionary Map of the Universe A deep radio continuum (small-n) survey that will study the evolution of star formation and massive back holes in galaxies on cosmological timescales. The survey is expected to detect about 70 million sources. EMU observations and archiving Observations taken in 12-hour schedule blocks with one block per field. Visibility data stored for 4 polarisation products and 300 spectral channels. Images archived for 1 pol and 1 spectral channel. 11 images per field for full characterisation of spectrum and image quality. Source detection tables stored
ASKAP science: WALLABY (S) WALLABY a blind spectral line (large-n) survey for neutral hydrogen. Survey is expected to detect spectral line emission from over 500,000 galaxies. WALLABY data processing and archives Central Processor will process 97 PB of visibility data NOT archived! WALLABY images made with 1 pol and 16,200 spectral channels. 3 images per field archived (residual, restored and mode). Two sets of Postage stamp images also archived. Each field divided into 350 and 1000 sub-regions. These allow for quick looks of smaller regions.
ASKAP science: VAST (T) VAST a survey for variable sources and slow transients. Examples are flare stars, intermittent pulsars, X-ray binaries, variable extragalactic sources. VAST data processing and archives VAST observations will mostly be taken in a piggy-back mode, using same input data stream taken for other projects. Visibility data are averaged to ~30 channels. VAST data produce one image every five seconds. Images are searched for sources and detections written to tables. Images are NOT archived. Typically about 500 1000 source detections in 5 seconds. Source detections written as table rows. Images are NOT archived.
Survey Science Projects: File-based data sizes in ASDA SSP Type Nfields Time per field Visibility data size (PB) Image data size (PB) EMU C 1200 12 2.7 0.003 POSSUM C 1200 8 1.8 1.2 WALLABY S 1200 8 NA 2.1 DINGO S 966 8 NA 1.3 FLASH S 850 4 NA 0.04 GASKAP S 644 12 NA 0.9 VAST T 1200 8 Small NA Notes: Data sizes for two other SSPs: CRAFT (fast transients) and COAST (pulsars) are not included here.
Guest Science Projects Will be allocated up to 25% of time. May be scheduled together with SSPs (using parallel data processing pipelines). Science cases not yet established. Proposals will be submitted as for other ATNF facilities on a 6-month cycle. For 750 hours per semester, can expect about 5 10 GSPs to be scheduled.
ASDA Example Use Cases Group General use GSP teams Notes: Expertise Level Low Medium Example Use cases (preliminary) Download small number of full-sized images Download larger numbers of postage stamp images Cone searches identify sources of interest Carry out more complex table searches Download light curve information for given source(s) Comparison of ASKAP data with data from other facilities As above plus: Some post-data processing may be requested Some data visualisation Many (but not all) requirements may be largely satisfied through web + VO services.
ASDA science use: Survey Science Projects Group SSP teams Expertise Level High Example Use cases (preliminary) As for the general use and: Production of final Source Catalogues High-volume data downloads for data validation Re-analysis of visibility data to generate different images with different parameters Data visualisation across large sky regions Post-analysis of image cubes: image stacking Rapid follow up observations of transient sources Most SSP teams are likely to require access to High Performance Computing and temporary data storage. HPC services are outside the scope of the CASS ASKAP project. They can be provided by ivec and other groups.
Preliminary ASDA requirements Essential Requirements Data ingest from CP as the data products become available. Data transfer to tape and/or disk. Indefinite data storage, with backups for all archives files, tables and associated metadata. Potential for data archive to be mirrored to other locations. Disaster recovery plans to minimise loss of data following a crisis such as fire or flood. Efficient data retrieval from tapes and/or disks. Fast access to some data products (Source catalogues, tables, some images..) Extensible and scalable: Design should allow for additional data products, increased data storage and new technologies. User registration through OPAL (as for other ATNF facilities). Validation tool for SSP teams (set metadata flag).
Essential Requirements (continued) User queries through web interfaces. Data retrieval across internet using VO services. Queue management for multiple user requests. Administration-level access and tools for Operations staff from CASS and ivec. Highly Desirable Requirements ASKAP provision for SSP teams to upload final source catalogues. User support and access to HPC facilities through ivec.
Thank you Jessica Chapman CSIRO Astronomy and Space Science t +61 2 9372 4196 e Jessica.Chapman@csiro.au w atnf.csiro.au CSIRO ASTRONOMY AND SPACE SCIENCE