Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre

Similar documents
MAST: The Mikulski Archive for Space Telescopes

How To Understand And Understand The Science Of Astronomy

CADC and CANFAR: Extending the role of the data centre. Séverin Gaudet Canadian Astronomy Data Centre

Datamanagement at the European Southern Observatory: Strategies and Challenges. Michael Sterzik, ESO Datamanagement and Operations Division

The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory

AST 4723 Lab 3: Data Archives and Image Viewers

ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)

irods in complying with Public Research Policy

Data Management at UT

Organization of VizieR's Catalogs Archival

EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK

on the establishment of a Brazilian Science Data Center (BSDC) General Guidelines

The astronomical Virtual Observatory : lessons learnt, looking forward. Françoise Genova - Forum VO-PDC d après ADASS XXI, Paris, nov.

RESEARCH DATA MANAGEMENT POLICY

Redefining Microsoft SQL Server Data Management. PAS Specification

Data Lab System Architecture

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

THE CCLRC DATA PORTAL

The LSST Data management and French computing activities. Dominique Fouchez on behalf of the IN2P3 Computing Team. LSST France April 8th,2015

Data Management using irods

An IDL for Web Services

Redefining Microsoft Exchange Data Management

Long Term Preservation of Earth Observation Data

NASA's Postdoctoral Fellowship Programs

Introduction to NetApp Infinite Volume

Planning and Scheduling Software for the Hobby Eberly Telescope. Niall I. Gaffney Hobby Eberly Telescope Mark E. Cornell McDonald Observatory

OpenAIRE Research Data Management Briefing paper

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

IoT R&I on IoT integration and platforms INTERNET OF THINGS FOCUS AREA

Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition

Data Mining Challenges and Opportunities in Astronomy

National Science Foundation Office of Inspector General 4201 Wilson Boulevard, Suite I-1135, Arlington, Virginia 22230

H2020 Guidelines on Open Data and Data Management Plan

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Data transport in radio astronomy. Arpad Szomoru, JIVE

Data-Intensive Science and Scientific Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

How To Write A Blog Post On Globus

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

August, 2000 LIGO-G D.

Geospatial Data Archiving

Data Archiving for Littelfuse Paved the Way for One Day SAP ERP ECC 6.0 Upgrade

Privacy and Security within an Interoperable EHR

Breaking Down the Silos: A 21st Century Approach to Information Governance. May 2015

Digital preservation a European perspective

NASA s Big Data Challenges in Climate Science

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

XenData Video Edition. Product Brief:

Vodacom Managed Hosted Backups

Research Data Management in Horizon 2020

Exploitation of ISS scientific data

SURFsara Data Services

LIBER Case Study: University of Oxford Research Data Management Infrastructure

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Virginia Commonwealth University Rice Rivers Center Data Management Plan

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

TERRITORY RECORDS OFFICE BUSINESS SYSTEMS AND DIGITAL RECORDKEEPING FUNCTIONALITY ASSESSMENT TOOL

Arkivum's Digital Archive Managed Service

SAR Archive and Community Support Activities at UNAVCO

Transcription:

Canadian Astronomy Data Centre Séverin Gaudet David Schade Canadian Astronomy Data Centre

Data Activities in Astronomy Features of the astronomy data landscape Multi-wavelength datasets are increasingly important scientifically More large, homogeneous survey datasets are being produced Open data policies With pre-defined proprietary periods Good IT infrastructure Standards for data interoperability Common file formats Virtual Observatory project Immersed in a world of remarkable technical capabilities

The Data Benchmark Since 1990, the Hubble Space Telescope Science Archive has set the standard Mandated by NASA, supported by ESA and CSA Public data policy Accessibility Processed products In 2005, the papers based on archival data exceeded PI papers 3

The Data Benchmark 4

International Virtual Observatory Alliance

Data curation (paraphrased from Wikipedia) The process of identification and organisation of objects in a collection in order to further knowledge. Includes verification and additions to the existing metadata for objects. The process of examining, testing and selecting metadata to go in a collection database. Activities of a Data Centre

Activities of a Data Centre Data curation (paraphrased from Wikipedia) The process of identification and organisation of objects in a collection in order to further knowledge. Includes verification and additions to the existing metadata for objects. The process of examining, testing and selecting metadata to go in a collection database. Activities Data transfer and ingestion Data modelling and characterisation Data processing Data discovery Data distribution Data preservation

Canadian Astronomy Data Centre Created in 1986 following a CASCA resolution in 1985 Original mandate: to serve Hubble Space Telescope data Original name: the Canadian Space Astronomy Data Centre Supported in part by the Canadian Space Agency since mid 90s Now a national facility serving many of Canada s major telescopes

Canadian Astronomy Data Centre HST FUSE MOST CFHT Gemini N Gemini S JCMT CGPS MACHO Heterogeneous collection: multiple missions and facilities multiple wavelengths Pointed and survey observations Many different archive data models

Growth in CADC Collections 300 250 200 Terabytes 150 100 50 0 2003 2004 2005 2006 2007 Year In 2007: > 80TB 8 major sources of data Network transfer only

Current Collection 2008-01-11 Archive Number of Files Number of GB BLAST 19 1 CFHT 1,946,588 174,950 DSS 7,268,759 952 FUSE 3,433,858 2,709 GEMINI 1,689,389 5,676 GPS 2,074 80 HST 12,785,565 59,314 IRIS 1,720 2 JCMT 740,652 1,364 MACHO 2,066,319 15,230 MOST 327 11 Total 29,964,423 260,486 Two compressed copies on disk at HIA One compressed copy on tape at UVic

Activities Data Processing Data processing Removing instrument and telescope signatures Combining multiple images Useable science products Processing close to the data Examples: HST WFPC2 HST ACS CFHT Megacam

Activities Data Processing Data processing Removing instrument and telescope signatures Combining multiple images Useable science products Processing close to the data Examples: HST WFPC2 HST ACS CFHT Megacam

Activities Data Processing Data processing Removing instrument and telescope signatures Combining multiple images Useable science products Processing close to the data Examples: HST WFPC2 HST ACS CFHT Megacam

Activities Data Discovery

Activities Data Discovery

Activities Data Discovery

Activities Data Discovery

Inter-archive Links

Inter-archive Links

Inter-archive Links

Inter-archive Links

Data distribution Select and download Direct programmatic access Asynchronous retrieval Activities Data Distribution

Data distribution Select and download Direct programmatic access Asynchronous retrieval Activities Data Distribution

Data distribution Select and download Direct programmatic access Asynchronous retrieval Activities Data Distribution

Activities Data Distribution Data distribution Select and download Direct programmatic access Asynchronous retrieval Retrieve an HST ACS drizzle image curl -g -o J8OZ02010_DRZ.fits.gz http://www.cadc.hia.nrc.gc.ca/ anonproxy/getdata?archive=hst&file_id=j8oz02010_drz Retrieve extension 10 of a proprietary CFHT12K image curl -g -u username:password -o 687344o_10.fits.gz http:// www.cadc.hia.nrc.gc.ca/authproxy/getdata? archive=cfht&file_id=687344o&cutout=[10]

CADC Data Distribution In 2007: > 1 million files, > 58 TB Provided data and services to > 2500 distinct hosts worldwide Network distribution only 100.00 10.00 1.00 0.10 0.01 0.00 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 Terabytes (log) Year

Users in 87 countries CADC Data Flows

International Virtual Observatory Alliance Mission: To facilitate the international coordination and collaboration to enable the international utilization of astronomical archives as an integrated and interoperating virtual observatory. Formed in 2002 A culture of data sharing

Developing: Data models Data query and access protocols Service discovery Tools Data centres bear the cost of implementation Benefit from world-wide development efforts International Virtual Observatory Alliance

VO Tools

VO Tools http://www.cadc.hia.nrc.gc.ca Octet http://www.cadc.hia.nrc.gc.ca/cvo/

Lessons Learned Archives are not just a technological exercise: they are science projects! Multidisciplinary team necessary Enable the user to find relevant data Well described reliable data Good interfaces with data providers Good interfaces with user communities End-to-end data management is part of the whole mission design retro-fitting is not fun!

Change is driven by... Data providers (telescopes) User community Funding agencies and by... Enabling technologies Virtual Observatory Managing Change

The Changing Role Data Providers More services Data distribution Processing Data management Quality of service 24x7 availability Robust infrastructure Fail-over systems

The Changing Role User Community Improved access Anonymous access to public data Authenticated access to proprietary data Direct programmatic access An extension of the user s storage User defined processing Quality of service 24x7 availability Robust infrastructure Fail-over systems Community projects

The evolving data centre role The Virtual Observatory Continuous improvement of services to the user community Promotion and training Maintaining and fostering international collaborations Technology New missions (e.g. JWST, UVIT,...) Knowledge retention The last network mile Funding agencies Challenges

www.cadc.hia.nrc.gc.ca