MAST: The Mikulski Archive for Space Telescopes
|
|
|
- Dorothy Curtis
- 10 years ago
- Views:
Transcription
1 MAST: The Mikulski Archive for Space Telescopes Richard L. White Space Telescope Science Institute 2015 April 1, NRC Space Science Week/CBPSS
2 A model for open access The NASA astrophysics data archives are a model for open access to data. See white paper by S. Murray & M. Postman on public access to space astronomy data. MAST has provided open access to science data for more than 25 years. Hubble changed the paradigm: provide calibrated, science-ready data to the entire community after a proprietary period of 1 year (or less). Archival research greatly enhances the science at a small fraction of the total mission cost. 4/1/15 NRC Space Science Week 2
3 Lessons learned Use standard data formats Capture complete & accurate metadata Create a data model to enable cross-mission research Engage science teams for expertise & support Motivation for the team: increased scientific impact Key for STScI: many scientific staff are advanced users of the data Generate high-level, science-ready data products Build expertise into pipelines Community-contributed products are also important Provide powerful, web-based tools for data mining 4/1/15 NRC Space Science Week 3
4 HR8799 b,c,d imaged by HST in 1998 planet b: 83,000x fainter than star at 1.72 arcsec planet c: 36,000x fainter than star at 0.96 arcsec planet d: 33,000x fainter than star at 0.60 arcsec These results were made possible by post-processing speckle subtraction and achieve over an order of magnitude contrast improvement over the state of the art when the data was taken in Soummer et al. 2011, Pueyo et al. 2014
5 Introduction to MAST MAST: a NASA astrophysics data archive center Archive established with HST launch in 1990 Multi-mission since addition of IUE in active missions including Hubble, Kepler Many legacy missions: GALEX, IUE, FUSE, Future: TESS, JWST, WFIRST, 4/1/15 NRC Space Science Week 5
6 MAST Holdings 21 missions/projects (current and planned) Multiple wavebands from far ultraviolet to infrared 4/1/15 NRC Space Science Week 6
7 MAST Searches 2.2 million searches per month in 2014 > 12,000 registered archive users 4/1/15 NRC Space Science Week 7
8 MAST Data Distribution The archive distributes 7 10x more data than is archived each month. The average for 2014 was 18.5 TB/month. 4/1/15 NRC Space Science Week 8
9 MAST Data Growth 320 TB of data products MAST ingest rate 2.7 TB / month (2014) 4/1/15 NRC Space Science Week 9
10 Long-term MAST Data Growth Pan-STARRS survey (not NASA-funded) 4/1/15 NRC Space Science Week 10
11 HST Publication Rate Over the last 4 years, 1200 papers were published each year using MAST data. The publication rate for totally archival Hubble papers has exceeded the non-archival (GO/PI) publication rate every year since /1/15 11
12 MAST data are diverse MAST data are diverse: in date of observations: from 1972 to now in data types: images, spectra, light curves, catalogs, models in scale: from Pan-STARRS (2 PB images TB database) to small shuttle-based missions to community-contributed projects with just a few files in processing level: from raw data packets delivered directly from spacecraft to science-ready, high-level science products Many different missions and instruments Hubble alone: 12 different instruments, 17 varieties of detector, hundreds of instrument modes/filters/etc. 4/1/15 12
13 Dealing with data diversity 1 Accept diversity as a fact Metadata, telemetry, calibration data, etc., are always going to be specific to the mission/instrument Implement diverse storage & access mechanisms Create separate mission databases Supply custom interfaces for advanced mission searches Store mission-specific info in standard configuration files Build interfaces automatically from config data Allow direct database (DB) access by users (CasJobs) Supports large queries using all DB parameters User work areas for uploading and creating user DBs Pros: Preserves all details of mission-specific data Cons: - Significant manual effort required for new missions & projects - Mission-specific user expertise required to use advanced features 4/1/15 NRC Space Science Week 13
14 Dealing with data diversity 2 Create homogeneous views to integrate diverse data Common Archive Observation Model (CAOM) Data model defining common subset of metadata for all missions Virtual Observatory (VO) protocols Common scriptable interfaces for data access Used by users, between archives, and inside the archive MAST Discovery Portal interface Single interface with access to all MAST data Tools for previewing, selecting, analyzing, & downloading data Can also access other archives through VO protocols Most current MAST work focuses on this unifying approach Pros: New features are usable across missions Easier for users to learn Cons: - Some mission-specific info not accessible (but still exists) 4/1/15 NRC Space Science Week 14
15 MAST lessons learned: data Use standard data formats Astronomy has used FITS for decades Self-describing, open data format for images, tables, etc. with embedded simple metadata description FITS is showing its age (not good for hierarchical data) Expect formats and conventions to evolve HST used 4 different FITS data formats for 4 spectrographs! Capture complete & accurate metadata But metadata will evolve too for most missions 4/1/15 NRC Space Science Week 15
16 MAST lessons learned: data products 1 Generating science-ready, high-level science products (HLSPs) is key to enabling archival science. There is a false savings in delivering only raw data products: the costs of processing data then are incurred many times over by different users in different locations. Archives that deliver only raw data are much less useful and are much less used than archives that deliver science-ready products. HST data processing pipelines generate calibrated datasets that are science-ready. On-the-fly reprocessing from raw data uses current instrument calibrations Astronomical data is difficult: single-photon counting with extreme demands on calibration & sensitivity If we can do it, you can generate science-ready data too! 4/1/15 NRC Space Science Week 16
17 MAST lessons learned: data products 2 Community-contributed HLSP are even better Used 10x as much as typical pipeline products Rely on scientific refereeing process to ensure data quality Many advanced in data processing algorithms originate in these projects Improvements are used in advanced MAST pipelines (e.g., Hubble Legacy Archive) to generate higher level products modeled on community projects Build as much expertise as possible into sophisticated pipelines & well-documented data products These may be the only products that remain easily usable when community expertise fades after the mission ends 4/1/15 NRC Space Science Week 17
18 Sample HLSPs: HST Multi-Cycle Treasury Programs CANDELS (Faber/Ferguson) 3.1 TB of data ~ 53 TB distributed to 2178 IP addresses CLASH (Postman) 0.6 TB of data ~ 5.7 TB distributed to 1637 IP addresses PHAT (Dalcanton) 2.0 TB of data ~ 8.9 TB distributed to 2485 IP addresses 18
19 MAST lessons learned: capturing expertise Capture the knowledge of experts on data and instruments Keep data close to science team while mission is active MAST provides support ranging from archive/database specialists to software developers to instrument scientists A key for STScI is that many of our scientific staff are both calibration experts and advanced users of our data who push the archive capabilities and data products Plan for closeout at end of mission 4/1/15 NRC Space Science Week 19
20 MAST lessons learned: engage the teams The best advocates for any data set are the team that designed & built the experiment and collected the data. Teams working on archival projects are also a valuable resource. If you can engage the team, they will produce the higher-level products that are necessary to enhance the archival value of the data. Why would the science teams want to expend that effort? We show them that they will increase the scientific impact of their work (and increase the citations for their papers!) by contributing high-level science products. It is good for them and good for science. 4/1/15 NRC Space Science Week 20
21 MAST lessons learned: user interfaces Web interfaces are much preferred to downloaded software Web software is always up-to-date Everybody has a browser Example: old Starview MAST interface (Java application) was used by only 1% of users despite additional functionality Challenges: Evolving web technology But HTML5 + Javascript + server-side software is now standard and widely supported See MAST portal as example 4/1/15 NRC Space Science Week 21
22 MAST lessons learned: plan for change An archive is an evolving collection of data products, interfaces, and access services. It is not simply a fixed collection of data! A long-lived data archive must evolve along with the world and the user community. At Hubble launch in 1990: There was no internet Disk storage cost $20 million per terabyte Operating an up-to-date archive in the midst of rapid change is hard! 4/1/15 NRC Space Science Week 22
23 Disk storage prices since Hubble launch 4/1/15 NRC Space Science Week 23
24 Large data products challenge the open access model Downloading some datasets is simply not practical Pan-STARRS has 2 petabytes of images along with a 100 terabyte catalog database The GALEX photon database is a 150 TB table with all 1 trillion photons collected by the GALEX ultraviolet sky survey mission Essential to provide tools that allow users to query, browse, and mine big data without downloading it Allow downloads of selected data and dynamically generated products (image cutouts, catalog subsets, light curves, movies) but does that satisfy all requirements for open data access? 4/1/15 NRC Space Science Week 24
25 Summary An archive is not a bit bucket it is a living, evolving science machine. Archives can greatly enhance science at a relatively small cost. Once the infrastructure is in place, the incremental cost for new missions is modest. Keep the archive close to the scientists, who are needed to support and enhance the data & tools and who push the data to the limit. NASA s long-term funding for a network of archive data centers that support astrophysics data from gamma-rays to the microwave background has played a key enabling role. Successful research using archival data sets is dependent on the resident expertise and corporate memory that reside at the science centers. Portals to the Universe: The NASA Astronomy Science Centers, National Academies Press, /1/15 NRC Space Science Week 25
Canadian Astronomy Data Centre. Séverin Gaudet David Schade Canadian Astronomy Data Centre
Canadian Astronomy Data Centre Séverin Gaudet David Schade Canadian Astronomy Data Centre Data Activities in Astronomy Features of the astronomy data landscape Multi-wavelength datasets are increasingly
Lecture 5b: Data Mining. Peter Wheatley
Lecture 5b: Data Mining Peter Wheatley Data archives Most astronomical data now available via archives Raw data and high-level products usually available Data reduction software often specific to individual
Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research
Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS)
ASKAP Science Data Archive: Users and Requirements CSIRO ASTRONOMY AND SPACE SCIENCE (CASS) Jessica Chapman, Data Workshop March 2013 ASKAP Science Data Archive Talk outline Data flow in brief Some radio
Datamanagement at the European Southern Observatory: Strategies and Challenges. Michael Sterzik, ESO Datamanagement and Operations Division
Datamanagement at the European Southern Observatory: Strategies and Challenges Michael Sterzik, ESO Datamanagement and Operations Division European Southern Observatory - builds and operates state-of-the-art
EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK
EA-ARC ALMA ARCHIVE DATA USER GUIDEBOOK Prepared by James O. Chibueze NAOJ Chile Observatory Purpose of this handbook: This handbook is aimed at providing fundamental information on how and where to access
The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory
The Virtual Observatory: What is it and how can it help me? Enrique Solano LAEFF / INTA Spanish Virtual Observatory Astronomy in the XXI century The Internet revolution (the dot com boom ) has transformed
Organization of VizieR's Catalogs Archival
Organization of VizieR's Catalogs Archival Organization of VizieR's Catalogs Archival Table of Contents Foreword...2 Environment applied to VizieR archives...3 The archive... 3 The producer...3 The user...3
NASA's Postdoctoral Fellowship Programs
NASA's Postdoctoral Fellowship Programs Einstein Fellowships Dr. Charles A. Beichman & Dr. Dawn M. Gelino NASA Exoplanet Science Institute Dr. Ron Allen Space Telescope Science Institute Dr. Andrea Prestwich
Kepler Data and Tools. Kepler Science Conference II November 5, 2013
Kepler Data and Tools Kepler Science Conference II November 5, 2013 Agenda Current and legacy data products (S. Thompson) Kepler Science Center tools (M. Still) MAST Kepler Archive (S. Fleming) NASA Exoplanet
Migrating a (Large) Science Database to the Cloud
The Sloan Digital Sky Survey Migrating a (Large) Science Database to the Cloud Ani Thakar Alex Szalay Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES)
The Astronomical Data Warehouse
The Astronomical Data Warehouse Clive G. Page Department of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, UK Abstract. The idea of the astronomical data warehouse has arisen as the
on the establishment of a Brazilian Science Data Center (BSDC) General Guidelines
on the establishment of a Brazilian Science Data Center (BSDC) General Guidelines 1 Introduction Since the entrance of Brazil in ICRANet a variety of projects have been started to be developed a) in the
Are We Alone?! Exoplanet Characterization and Direct Imaging!
From Cosmic Birth to Living Earths A Vision for Space Astronomy in the 2020s and Beyond Are We Alone?! Exoplanet Characterization and Direct Imaging! A Study Commissioned by the Associated Universities
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
CADC and CANFAR: Extending the role of the data centre. Séverin Gaudet Canadian Astronomy Data Centre
CADC and CANFAR: Extending the role of the data centre Séverin Gaudet Canadian Astronomy Data Centre February 2012 Canadian Astronomy Data Centre Heterogeneous collection: Multiple missions, facilities
WFC3 Image Calibration and Reduction Software
The 2010 STScI Calibration Workshop Space Telescope Science Institute, 2010 Susana Deustua and Cristina Oliveira, eds. WFC3 Image Calibration and Reduction Software Howard A. Bushouse Space Telescope Science
Constructing the Subaru Advanced Data and Analysis Service on VO
Constructing the Subaru Advanced Data and Analysis Service on VO Yuji Shirasaki on behalf of ADC National Astronomical Observatory of Japan Astronomy Data Center Contents Subaru Telescope and Instruments
Learning from Big Data in
Learning from Big Data in Astronomy an overview Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ From traditional astronomy 2 to Big Data
AST 4723 Lab 3: Data Archives and Image Viewers
AST 4723 Lab 3: Data Archives and Image Viewers Objective: The purpose of the lab this week is for you to learn how to acquire images from online archives, display images using a standard image viewer
Data Mining Challenges and Opportunities in Astronomy
Data Mining Challenges and Opportunities in Astronomy S. G. Djorgovski (Caltech) With special thanks to R. Brunner, A. Szalay, A. Mahabal, et al. The Punchline: Astronomy has become an immensely datarich
How To Understand And Understand The Science Of Astronomy
Introduction to the VO [email protected] ESAVO ESA/ESAC Madrid, Spain The way Astronomy works Telescopes (ground- and space-based, covering the full electromagnetic spectrum) Observatories Instruments
and the VO-Science Francisco Jiménez Esteban Suffolk University
The Spanish-VO and the VO-Science Francisco Jiménez Esteban CAB / SVO (INTA-CSIC) Suffolk University The Spanish-VO (SVO) IVOA was created in June 2002 with the mission to facilitate the international
Mission Operations and Ground Segment
ESA Earth Observation Info Days Mission Operations and Ground Segment ESA EO Ground Segment and Mission Operations department (EOP-G) May 2013 EOEP 2013 Page 1 ESA Unclassified For Official Use MISSION
Data Lab System Architecture
Data Lab System Architecture Data Lab Context Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication
An IDL for Web Services
An IDL for Web Services Interface definitions are needed to allow clients to communicate with web services Interface definitions need to be provided as part of a more general web service description Web
The PACS Software System. (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November,5 2007 Issue 1.
The PACS Software System (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November,5 2007 Issue 1.0 PICC-ME-DS-003 1. Introduction The PCSS, the PACS ICC Software System, is the
Long Term Preservation of Earth Observation Space Data. Preservation Workflow
Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow
DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID
DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia S. G. Djorgovski G. Longo & DAME Working Group Istituto Nazionale di Astrofisica Astronomical Observatory of Capodimonte, Napoli Department
The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia
The Murchison Widefield Array Data Archive System Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia Agenda Dataflow Requirements Solutions & Lessons learnt Open solution
AGILE Data Center at ASDC: Overview, Catalogs and Science Tools
AGILE Data Center at ASDC: Overview, Catalogs and Science Tools Carlotta Pittori, ASDC on behalf of the AGILE Data Center 12 th AGILE Science Workshop, May 8-9 2014 April 23, 2007: Launch! Equatorial orbit:
Swarthmore College Newsletter
93 Fog, clouds, and light pollution limit the effectiveness of even the biggest optical telescopes on Earth. Astronomers who study ultraviolet or X-ray emission of stars have been more limited because
Data and Information Management for EO Data Centers. Eberhard Mikusch German Aerospace Center - German Remote Sensing Data Center
Data and Information Management for EO Data Centers Eberhard Mikusch German Aerospace Center - Mexico, 23. 04 2008 Earth Observation System Environment at DLR/DFD 10010 00101 Radar 10010 00101 Atmospheric
Efficient data reduction and analysis of DECam images using multicore architecture Poor man s approach to Big data
Efficient data reduction and analysis of DECam images using multicore architecture Poor man s approach to Big data Instituto de Astrofísica Pontificia Universidad Católica de Chile Thomas Puzia, Maren
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
NASA s Big Data Challenges in Climate Science
NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run
Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15
Summary of Data Management Principles Dark Energy Survey V2.1, 7/16/15 This Summary of Data Management Principles (DMP) has been prepared at the request of the DOE Office of High Energy Physics, in support
NuSTAR Ground Systems and Operations Approach Lessons Learned
University of California, Berkeley Space Sciences Laboratory NuSTAR Ground Systems and Operations Approach Lessons Learned Manfred Bester, Bryce Roberts, Mark Lewis, and William Marchant Space Sciences
TELESCOPE AS TIME MACHINE
TELESCOPE AS TIME MACHINE Read this article about NASA s latest high-tech space telescope. Then, have fun doing one or both of the word puzzles that use the important words in the article. A TELESCOPE
Data Requirements from NERSC Requirements Reviews
Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community
Using the Amazon Compute Cloud to Analyze Large Data Sets
Using the Amazon Compute Cloud to Analyze Large Data Sets The Panchromatic Hubble Andromeda Treasury Benjamin F. Williams University of Washington Movie by Anil Seth Ben Williams March 11, 2015 Hubble
Data analysis of L2-L3 products
Data analysis of L2-L3 products Emmanuel Gangler UBP Clermont-Ferrand (France) Emmanuel Gangler BIDS 14 1/13 Data management is a pillar of the project : L3 Telescope Caméra Data Management Outreach L1
SETTING UP AND RUNNING A WEB SITE ON YOUR LENOVO STORAGE DEVICE WORKING WITH WEB SERVER TOOLS
White Paper SETTING UP AND RUNNING A WEB SITE ON YOUR LENOVO STORAGE DEVICE WORKING WITH WEB SERVER TOOLS CONTENTS Introduction 1 Audience 1 Terminology 1 Enabling a custom home page 1 Adding webmysqlserver
The Bamberg photographic plate archive
The Bamberg photographic plate archive The digitizing project H. Edelmann, N. Jansen, U. Heber, H. Drechsel, J. Wilms, I. Kreykenbohm Dr. Karl Remeis-Observatory & ECAP Astronomical Institute Friedrich-Alexander-University
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE
USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE Gonzalo Garcia VP of Operations, USA Property of GMV All rights reserved INTRODUCTION Property of GMV All rights reserved INTRODUCTION
The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University
The Tonnabytes Big Data Challenge: Transforming Science and Education Kirk Borne George Mason University Ever since we first began to explore our world humans have asked questions and have collected evidence
How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, [email protected])
CSP CHRONOS Compliance statement for ISO 14721:2003 (Open Archival Information System Reference Model) 2009 ikeep Ltd, Morgenstrasse 129, CH-3018 Bern, Switzerland (www.ikeep.com, [email protected]) The international
How To Build A Connector On A Website (For A Nonprogrammer)
Index Data's MasterKey Connect Product Description MasterKey Connect is an innovative technology that makes it easy to automate access to services on the web. It allows nonprogrammers to create 'connectors'
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
What s new in Carmenta Server 4.2
What s new in Carmenta Server 4.2 A complete solution for cost-effective visualisation and distribution of GIS data through web services Carmenta Server provides cost-effective technology for building
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
