NASA s Big Data Challenges in Climate Science



Similar documents
NASA's Strategy and Activities in Server Side Analytics

The ORIENTGATE data platform

Environmental Data Management:

NCDC Strategic Vision

Data-Intensive Science and Scientific Data Infrastructure

Asynchronous Data Mining Tools at the GES-DISC

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

THE STRATEGIC PLAN OF THE HYDROMETEOROLOGICAL PREDICTION CENTER

NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

Cloud JPL Science Data Systems

Optimizing IT Deployment Issues

TRMM and Other Global Precipitation Products and Data Services at NASA GES DISC. Zhong Liu George Mason University and NASA GES DISC

Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure

High Performance Science Cloud! Meeting the Big Data Challenges of Climate Science!

Norwegian Satellite Earth Observation Database for Marine and Polar Research USE CASES

Ganzheitliches Datenmanagement

Big Data Services at DKRZ

Follow That Hurricane!

PART 1. Representations of atmospheric phenomena

Data Products via TRMM Online Visualization and Analysis System

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

Databases & Data Infrastructure. Kerstin Lehnert

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Distributed Computing. Mark Govett Global Systems Division

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

NOAA Environmental Data Management

International coordination for continuity and interoperability: a CGMS perspective

Information Architecture

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON


ABoVE Science Cloud: An orientation for the ABoVE Science Team

GOSIC NEXRAD NIDIS NOMADS

How To Teach Data Science

CONCEPTUAL DESIGN OF DATA ARCHIVE AND DISTRIBUTION SYSTEM FOR GEO-KOMPSAT-2A

Hurricanes. Characteristics of a Hurricane

HYCOM Meeting. Tallahassee, FL

Ensuring the Preparedness of Users: NOAA Satellites GOES R, JPSS Laura K. Furgione

Data Management Framework for the North American Carbon Program

How To Run A Space Station From A Polar Relay Station

Figure 2: System Flow Diagram for Workflow Management

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

The Matsu Wheel: A Cloud-based Scanning Framework for Analyzing Large Volumes of Hyperspectral Data

Joint Polar Satellite System (JPSS)

NASA's Earth Observing Data and Information System (EOSDIS)

An ESRI White Paper May 2007 Mobile GIS for Homeland Security

Big Data Research at DKRZ

Keystone Image Management System

Ames Consolidated Information Technology Services (A-CITS) Statement of Work

Data Management Handbook

How to avoid building a data swamp

FlowViewer. Maintaining NASA s Earth Science Traffic Situational Awareness

SPARC Data Center. Peter Love Marvin Geller Institute for Terrestrial and Planetary Atmospheres Stony Brook University

Cluster, Grid, Cloud Concepts

Functional Requirements for Digital Asset Management Project version /30/2006

Hadoop in the Hybrid Cloud

HPC technology and future architecture

Technology Implications of an Instrumented Planet presented at IFIP WG 10.4 Workshop on Challenges and Directions in Dependability

A standards-based open source processing chain for ocean modeling in the GEOSS Architecture Implementation Pilot Phase 8 (AIP-8)

A NEW STRATEGIC DIRECTION FOR NTIS

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

Data Semantics Aware Cloud for High Performance Analytics

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Data Stewardship for Mobile Platforms at Ocean Networks Canada

Transcription:

NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1

2

7-km GEOS-5 Nature Run Global Tropical Cyclones The GEOS-5 Nature Run must produce realistic tropical cyclone activity to be viable for tropical observing system simulation experiments. This includes realistic frequency and intraseasonal variability across the global basins, as well as intensities typically observed in nature. This GEOS-5 Nature Run successfully reproduces typical tropical cyclone activity in all basins including a large number of weak tropical storms as well as major hurricanes and typhoons. Tropical Storm winds 39-73 mph Hurricane winds 74-111 mph Major Hurricane winds 112+ mph In this short period from September 7-12, 2006 during the GEOS-5 7-km Nature Run, two hurricanes spin through the east Pacific basin while a major Atlantic hurricane develops in the Gulf of Mexico making landfall along the US Gulf coast as a category 3 hurricane with winds (color shading) in excess of 110 mph. 3

Projected Data Holding!!By 2020 it is estimated that all climate data holdings, including simulation, observation, and reanalysis sources, will grow to hundreds of exabytes in a worldwide-federated network [CKD Workshop, 2011 and CCDC Workshop, 2011].!! CCDC Workshop, International Workshop on Climate Change Data Challenges, June 2011, http://www.wikiprogress.org/index.php/ Event:International_Workshop_on_Climat e_change_data_challenges.!! CKD Workshop, Climate Knowledge Discovery Workshop, March 2011, DKRZ, Hamburg, Germany, https://redmine.dkrz.de/collaboration/ projects/ckd-workshop/wiki/ CKD_2011_Hamburg.!! Climate Data Challenges in the 21st Century, Jonathan T. Overpeck, et al. Science 331, 700 (2011); DOI: 10.1126/science.1197869 Credit: LLNL/Dean Williams 4

Turning Observations into Knowledge and Decision Products 5

Data Acquisition to Data Access Data Acquisition Flight Operations, Data Capture, Initial Processing & Backup Archive Data Transport to DAACs Science Data Processing, Data Mgmt., Data Archive & Distribution Distribution, Access, Interoperability & Reuse Spacecraft Tracking & Data Relay Satellite (TDRS) Research Education Ground Stations Data Processing & Mission Control NASA Integrated Services Network (NISN) Mission Services Science Data Systems (DAACs, NSSDC) MEASURE AIST, CMAC ACCESS WWW IP Internet Value-Added Providers Interagency Data Centers International Partners Use in Earth & Space Models Polar Ground Stations Science Teams Measurement Teams Benchmarking DSS 6 TECHNOLOGY 6

Scientific IT Requirements!!Scientists and engineers often computing services to perform data analysis, theory verification, and predictions!!often move large volume of data to and from data centers and to and from compute centers!!often need to communicate, collaborate, and share data with external (e.g. university) investigators!!often require high speed connections and high speed computing platforms beyond business administration requirements!!often require local disk storage and visualization HW and SW. 7

Typical Data Analysis and Data Processing Work Loads! A scientist or engineer queries a metadata server for the data and orders the data from a data center.! The data center fulfills the order by preparing (subsetting, resampling, averaging etc.) the data and puts the result on a FTP server.! After receiving a notification from the data center, the investigator goes to the FTP server and fetches the data.! Data is transmitted to the investigator s institution and stored on a local storage.! The investigator processes and analyzes the data locally using local computing resources.! Some of the processed data will have to transmitted back to the data center. 8

Before Big Data Analogy and Challenges!!Analogy:!!Challenges:! Stewardship! Curation! Indexing! Cataloging! Searching! Ordering! Subsetting! Provenance! Lineage! Data Mining! Dissemination 9

Gearing up for Big Data Analytics! Traditional data center focuses on data archive, access and distribution o! Scientists typically order and download specific data sets to a local machine to perform analysis o! With large amount of observational and modeling data, downloading to local machine is becoming inefficient o! Data centers are starting to provide additional services for data analysis! NASA computing and computational science program is building data analytics platforms using Climate Analytics as a Service (CAaaS) such as NASA Earth Exchange (NEX) and Observation for Model Intercomparison Project (Obs4MIPs) using Earth System Grid Federation (ESGF)! Build on irod, SciDB, Hadoop file system, Map Reduce, Apache OODT, Apache Open Climate technologies! Enabled by a rule based data management system! Current research focuses on how to manage data movement from the archives to the analytical platforms 10

Functional Architecture 11

Challenges in Big Data Analytics!!Challenges:! Remote and local data visualization! Server side processing! Distributed data analysis! Data on-boarding! ETL! High speed network! Data management! Data storage 12

Open Source Strategy 13

Science Infosystem Observations Theory Modeling Science Discovery Complete models, data analyses, OSSEs, sensor webs, virtual observatories Science frameworks / services (ESMF, POOMA, SWMF, Curator, Workflow), Data Management Science Applications Science Architectures Elements of numerical models, data services (dynamical cores, matrix solvers, subsetting etc.) Science Elements IT Security, Grid Utilities, Cloud Computing OS Batch Scheduling, Help Desk High Performance Computing, Cloud Computing, Storage, Networks, and Data Centers IT Middleware / Services Science Hardware Foundation 14 14

Future Directions and Challenges!! Scale with Big Data produced by higher resolution models, satellites, and instruments!! Expand server-side functionality!! Server-side processing through WPS (climate indexes, custom algorithms); GIS mapping services (for climate change impact studies at regional and local scale); Facilitate model to observations inter-comparison!! Expand direct client access capabilities!! Increased support for OPeNDAP based access; Track provenance of complex processing workflows for reproducibility and repeatability; Controlled Vocabularies!! Package VMs for Cloud deployment!! Instantiate ESGF nodes on demand for short lifetime projects; Environment with elastic allocation of back-end storage and computing resources 15

Research Opportunities NASA Research Opportunities in Space and Earth Sciences (ROSES) solicitation include: 1.! Data systems, data management, access, and data processing! Making Earth System data records for Use in Research Environments (MEaSUREs)! Advancing Collaborative Connections for Earth System Science (ACCESS) 2.! Advanced Information System Technology 3.! Modeling and Data Assimilation Research:! Earth science Modeling and Assimilation Program (MAP)! Atmospheric Composition: Modeling and Analysis! Heliophysics, Astrophysics Theory Programs 4.! Computational Modeling Algorithms and Cyberinfrastructure (CMAC) Program 5.! ROSES Solicitation Web site (enter ROSES in the keywords field): http://nspires.nasaprs.com/external/solicitations/solicitations.do? method=open&stack=push 16

Thank You! Tsengdar Lee, Ph.D. High-end Computing Program Manager Weather Focus Area Program Scientist NASA Headquarters tsengdar.lee@nasa.gov 17