ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)



Similar documents
The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

Australian Virtual Observatory

Using the Parkes Pulsar Data Archive

Science Drivers for Big Data Joseph Lazio SKA Program Development Office & Jet Propulsion Laboratory, California Institute of Technology

This meeting of the AT Users Committee was held at the ATNF Headquarters from 26 th 27 th May 2010.

Organization of VizieR's Catalogs Archival

Observer Access to the Cherenkov Telescope Array

Douglas Bock Assistant Director Operations ATUC, Oct 2011

ETERNUS CS High End Unified Data Protection

LSST Resources for Data Analysis

MAST: The Mikulski Archive for Space Telescopes

Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape

ALMA Overview. Leonardo Testi (European Southern Observatory)

Software challenges in the implementation of large surveys: the case of J-PAS

Archival of raw and analysed radar data at EISCAT and worldwide

The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

The PACS Software System. (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November, Issue 1.

SAN Conceptual and Design Basics

巨 量 資 料 分 層 儲 存 解 決 方 案

A New Data Visualization and Analysis Tool

LSST and the Cloud: Astro Collaboration in 2016 Tim Axelrod LSST Data Management Scientist

Lecture 5b: Data Mining. Peter Wheatley

The Very High Energy source catalogue at the ASI Science Data Center

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

ALMA Science Operations

Charting the Transient Radio Sky on Sub-Second Time-Scales with LOFAR

Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre

EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK

How Does the Cloud Fit into Active Archiving. Active Archive Alliance Panel

Ionospheric Research with the LOFAR Telescope

A Backup Strategy for Informatics Craig Strachan Version /01/2011

This work was done in collaboration with the Centre de Données astronomiques de Strasbourg (CDS) and in particular with F. X. Pineau.

XenData Product Brief: SX-550 Series Servers for LTO Archives

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Data Mining Challenges and Opportunities in Astronomy

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

ALMA Operations the role of the European ALMA Regional Centre Plans for cycle 1

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect

A Preliminary Summary of The VLA Sky Survey

Take Your Tax Practice to New Heights

8.1 Radio Emission from Solar System objects

Data Management and Retention for Standards Consortia

Redefining Microsoft SQL Server Data Management. PAS Specification

XenData Product Brief: SX-520 Series Servers for Sony Optical Disc Archives

STORAGE Arka Service s.r.l.

The Solar Science Data Center and LOFAR

Amazon Cloud Storage Options

NAS 254 Cloud Backup. Use Cloud Backup to backup your data to Amazon S3 A S U S T O R C O L L E G E

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Discover Smart Storage Server Solutions

Protecting Information in a Smarter Data Center with the Performance of Flash

How To Use The Wynn Odi

Backup and Recovery 1

MANAGING AND MINING THE LSST DATA SETS

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

EChO Ground Segment: Overview & Science Operations Assumptions

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

Archiving the insurance data warehouse

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Science and the Taiwan Airborne Telescope

Data analysis of L2-L3 products

LOFAR Software Development Roadmap

How To Understand And Understand The Science Of Astronomy

The Astronomical Data Warehouse

Einstein Rings: Nature s Gravitational Lenses

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Introduction to Optical Archiving Library Solution for Long-term Data Retention

Data Data Everywhere, We are now in the Big Data era

August, 2000 LIGO-G D.

SharePoint & Azure: Digital Asset Management

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey

Objectif. Participant. Prérequis. Pédagogie. Oracle Database 11g - Administration Workshop II - LVC. 5 Jours [35 Heures]

Wide-Field Plate Database: Service for Astronomy

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

OSKAR Example Revision: 8

BACKUP SECURITY GUIDELINE

Archiving Information Storage and Its Advantages

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Fermi LAT and GBM Data Access, File Structure and the Standard Tools. Fermi Solar Analysis Workshop GSFC, August 2012 Elizabeth Ferrara - FSSC 1

Evolving Risks Of Data Storage. Neville G.H. Green Group Underwriting Manager HSB Engineering Insurance Ltd

An ArrayLibraryforMS SQL Server

XenData Archive Series Software Technical Overview

The Expanding Digital Universe: Can we Contain it?

How NAS Can Increase Reliability, Uptime & Data Loss Protection: An IT Executive s Story

Multi-Terabyte Archives for Medical Imaging Applications

Constructing the Subaru Advanced Data and Analysis Service on VO

AGILE Data Center at ASDC: Overview, Catalogs and Science Tools

Indiana University Science with the WIYN One Degree Imager

Media Cloud Building Practicalities

How To Process Data From A Casu.Com Computer System

A User s Guide to ALMA Scheduling Blocks (Cycle 3)

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

Redefining Microsoft Exchange Data Management

A remote diagnostics system for the MERLIN array. D. Kettle, N.Roddis Jodrell Bank Observatory University of Manchester Macclesfield SK11 9DL

SKSPI33 Undertake image asset management

A View on the Future of Tape

LSST Data Management. Tim Axelrod Project Scientist - LSST Data Management. Thursday, 28 Oct 2010

Transcription:

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013

ASKAP Science Data Archive Talk outline Data flow in brief Some radio astronomy terms Archive overview ASKAP operations Science users and use cases Requirements summary

Murchison Radio Observatory Pawsey Centre Worldwide Central processor: Processes the raw visibilities and outputs science data products. Science Data Archive Facility: Data storage and access to science data products

ASKAP Central Processor Data from same set of visibilities can be passed through three pipelines for Transient, Continuum and Spectral Line imaging. In principle this allows up to three different experiments to be scheduled simultaneously.

latitude Digression: Some radio astronomy terms Visibilities Fourier transform Images Radio astronomy images are created using a Fourier Transform of the observed visibilities. A single image is a map of a region of sky. This may show the sky brightness or another measured parameter. An image cube is a set of images for one region covering a range of frequency (spectral) channels. longitude

OH spectral Line emission from an evolved star Continuum image cubes (small-n) have up to 300 frequency channels (300 x 1 MHz) Spectral line image cubes (large-n) up to 16,200 spectral channels Transient observations generate one image every 5 seconds.

ASKAP Data Processing levels

ASKAP Science Data Archive (ASDA) ASKAP responsible for storage and access to level 5 data products. Survey Science Project Teams responsible for level 6 data validation. Astronomers and their science teams responsible for generation of level 7 enhanced data products.

ASDA: Schematic Data Management Layer HSM Tape libraries 2 x 20 PB Disk storage > 4 PB Science Database Nodes Portal Nodes Database cache storage 24 TB Web server 1 Web Server 2 Web Server 3 Web Server 3 V O S E R V I C E S F I R E W A L L Storage ~ 50 TB International astronomy community

ASDA Data Flows Schematic

ASDA Components Hierarchical Storage Management System with: 2 x 20 PB Tape Libraries (one as backup of other) several PB disk storage Science Database Management System 4 science database nodes with ~ 50 TBytes of storage Web servers with ~ 24 TBytes for cached data storage Virtual Observatory interface to external users Workflow processes User support and training Access to High Performance Computing facilities

ASDA: Data Management Layer The HSM will control the transfer of data to and from storage media. An additional data management layer is needed for the science archive. This will include software for: Generation of science metadata and updates to the science database User registration and data access controls User services including web interfaces, search forms, provision of Virtual Observatory services. Workflow management of interactions between the archive, the Central Processor, HSM and users.

Science Data Access Access to files, tables and search results will be provided through Virtual Observatory services: Simple Image Access Protocol returns link to images/cubes identified for a given position. Can generate image cut-outs. Cone searches: Returns table results such as positions and fluxes for sources detected within an area around agiven position. Table Access Protocol: Allows for complex querying of tables. For example could return a list of detections for sources above a given flux density with negative spectral indices (slopes). Note: Radio Astronomy is less familiar with using VO services than optical astronomy. Full adoption of VO services will require a cultural shift for users.

ASDA Primary Data Products Product Calibrated continuum visibility data (stored as measurement sets ) Continuum (small-n) images and image cubes Spectral line (large-n) image cubes Postage stamp images (quick look images produced with fewer pixels) Continuum source detection tables Spectral line source detections Transient source detections Transient light curves Project Source Catalogues Data type File File File File Table Table Table Table Table

Examples: File-based Data Products Product Parameters used Data Size 12-hr calibrated continuum visibility data set Set of three continuum images (restored, residual and model) Set of four polarisation continuum image cubes Single spectral line image cube 36 beams 300 channels 4 polarisations 666 baselines 5s integration Imsize = 10,800 pix 1 pol 1 spectral channel 3 images per position 300 spectral channels 4 pols Imsize = 3,600 pix 16,200 channels 1 pol 2.2 TBytes 5.4 GBytes 556 GBytes 839 GBytes

ASKAP Operations Longest baseline is 6 km. 36 antennas gives 630 baselines

ASKAP Operations Operated as part of the Australia Telescope National Facility (ATNF) Current ATNF Facilities are: Parkes radio telescope, Australia Telescope Compact Array, Mopra radio telescope, Long Baseline Array National Facility time is allocated on basis of scientific merit and technical feasibility Data taken on ATNF facilities are owned by CSIRO All ASKAP observations taken remotely (usually from Marsfield SOC) Science data available to all astronomers, in most cases with NO proprietary period

ASKAP Archive Users Astronomy Group Number of individuals Member of Survey Science Project teams 350 Member of Guest Science Project team 400 General astronomy community 750+ CASS Astrophysics group 30 Supported by CASS Science Operations 30 ivec Operations ~15 Technical developers tbd

ASKAP Science Survey Projects (SSPs) Projects need at least 1500 hours of telescope time 10 SSPs were selected in 2009. Their science goals are drivers of ASKAP development. SSP teams will work with CASS staff on aspects of science and archive commissioning. Project teams are responsible for their data validation i.e. will assess whether the data taken are of science quality. For bad data, observations may be retaken.

ASKAP science: EMU (C) Evolutionary Map of the Universe A deep radio continuum (small-n) survey that will study the evolution of star formation and massive back holes in galaxies on cosmological timescales. The survey is expected to detect about 70 million sources. EMU observations and archiving Observations taken in 12-hour schedule blocks with one block per field. Visibility data stored for 4 polarisation products and 300 spectral channels. Images archived for 1 pol and 1 spectral channel. 11 images per field for full characterisation of spectrum and image quality. Source detection tables stored

ASKAP science: WALLABY (S) WALLABY a blind spectral line (large-n) survey for neutral hydrogen. Survey is expected to detect spectral line emission from over 500,000 galaxies. WALLABY data processing and archives Central Processor will process 97 PB of visibility data NOT archived! WALLABY images made with 1 pol and 16,200 spectral channels. 3 images per field archived (residual, restored and mode). Two sets of Postage stamp images also archived. Each field divided into 350 and 1000 sub-regions. These allow for quick looks of smaller regions.

ASKAP science: VAST (T) VAST a survey for variable sources and slow transients. Examples are flare stars, intermittent pulsars, X-ray binaries, variable extragalactic sources. VAST data processing and archives VAST observations will mostly be taken in a piggy-back mode, using same input data stream taken for other projects. Visibility data are averaged to ~30 channels. VAST data produce one image every five seconds. Images are searched for sources and detections written to tables. Images are NOT archived. Typically about 500 1000 source detections in 5 seconds. Source detections written as table rows. Images are NOT archived.

Survey Science Projects: File-based data sizes in ASDA SSP Type Nfields Time per field Visibility data size (PB) Image data size (PB) EMU C 1200 12 2.7 0.003 POSSUM C 1200 8 1.8 1.2 WALLABY S 1200 8 NA 2.1 DINGO S 966 8 NA 1.3 FLASH S 850 4 NA 0.04 GASKAP S 644 12 NA 0.9 VAST T 1200 8 Small NA Notes: Data sizes for two other SSPs: CRAFT (fast transients) and COAST (pulsars) are not included here.

Guest Science Projects Will be allocated up to 25% of time. May be scheduled together with SSPs (using parallel data processing pipelines). Science cases not yet established. Proposals will be submitted as for other ATNF facilities on a 6-month cycle. For 750 hours per semester, can expect about 5 10 GSPs to be scheduled.

ASDA Example Use Cases Group General use GSP teams Notes: Expertise Level Low Medium Example Use cases (preliminary) Download small number of full-sized images Download larger numbers of postage stamp images Cone searches identify sources of interest Carry out more complex table searches Download light curve information for given source(s) Comparison of ASKAP data with data from other facilities As above plus: Some post-data processing may be requested Some data visualisation Many (but not all) requirements may be largely satisfied through web + VO services.

ASDA science use: Survey Science Projects Group SSP teams Expertise Level High Example Use cases (preliminary) As for the general use and: Production of final Source Catalogues High-volume data downloads for data validation Re-analysis of visibility data to generate different images with different parameters Data visualisation across large sky regions Post-analysis of image cubes: image stacking Rapid follow up observations of transient sources Most SSP teams are likely to require access to High Performance Computing and temporary data storage. HPC services are outside the scope of the CASS ASKAP project. They can be provided by ivec and other groups.

Preliminary ASDA requirements Essential Requirements Data ingest from CP as the data products become available. Data transfer to tape and/or disk. Indefinite data storage, with backups for all archives files, tables and associated metadata. Potential for data archive to be mirrored to other locations. Disaster recovery plans to minimise loss of data following a crisis such as fire or flood. Efficient data retrieval from tapes and/or disks. Fast access to some data products (Source catalogues, tables, some images..) Extensible and scalable: Design should allow for additional data products, increased data storage and new technologies. User registration through OPAL (as for other ATNF facilities). Validation tool for SSP teams (set metadata flag).

Essential Requirements (continued) User queries through web interfaces. Data retrieval across internet using VO services. Queue management for multiple user requests. Administration-level access and tools for Operations staff from CASS and ivec. Highly Desirable Requirements ASKAP provision for SSP teams to upload final source catalogues. User support and access to HPC facilities through ivec.

Thank you Jessica Chapman CSIRO Astronomy and Space Science t +61 2 9372 4196 e Jessica.Chapman@csiro.au w atnf.csiro.au CSIRO ASTRONOMY AND SPACE SCIENCE