AMLIGHT, Simulation Datasets, and Global Data Sharing
|
|
|
- Jean Boone
- 10 years ago
- Views:
Transcription
1 AMLIGHT, Simulation Datasets, and Global Data Sharing Jean-Bernard Minster (1,2,4,6), John J. Helly (1,2), Steven M. Day (3,4), Raul Castro Escamilla (5), Philip Maechling (4),Thomas H. Jordan (4), Amit Chourasia (2,4), Mustapha Mokrane (6) 1 SIO, 2 SDSC, 3 SDSU, 4 SCEC, 5 CICESE, 6 ICSU-WDS AMLIGHT, Big Data, Big Network, CICESE
2 Open data Many countries have adopted an open data policy, at least for research and education (e.g. US, France, UK, ZA, etc.) This often includes the output of numerical models and simulations. But, because of different laws, large international organizations discuss principles instead of policy. AMLIGHT, Big Data, Big Network, CICESE 2
3 Data Sharing Policy ICSU World Data Centers ( ) Federation of Astronomical and Geophysical Data Analysis Services ( ) Full and Open access to data Long-term data Stewardship and curation AMLIGHT, Big Data, Big Network, CICESE 3
4 Data Sharing Principles Group on Earth Observations (GEO, 130+ nations) / Global Earth Observation System of Systems (GEOSS) present. Equitable, unimpeded access to data for research and education Long-term data preservation Many exceptions (National security, privacy laws, commercial protection, ecological protection) AMLIGHT, Big Data, Big Network, CICESE 4
5 Data Sharing Policy ICSU World Data System Data Policy (2008-present) Full and Open access to data Long-term data Stewardship AMLIGHT, Big Data, Big Network, CICESE 5
6 WDS Data Policy AMLIGHT, Big Data, Big Network, CICESE 6
7 Research Data Alliance and WDS (RDA/WDS, 2013) Include socio-economic, health, and other data in policy discussions Explore data publishing concepts and issues Collaboration with publishers AMLIGHT, Big Data, Big Network, CICESE 7
8 This works for observational data in the natural sciences, especially environmental data, that can never be acquired again Perhaps also for socio-economic, and human health data sets (with caveats, so as aggregation) AMLIGHT, Big Data, Big Network, CICESE 8
9 The Environmental Information System Tree Private Sector Under Development Distribution & Use End Users Legend End user (public) End user (private) Integration & Validation Models & Analysis Centers Synthesized Core Products Archive Quality Assurance Distribution (full & open) Distribution (proprietary) Observations & Data Collection International Networks Measurement Systems National Supplements Public data Data buy AMLIGHT, Big Data, Big Network, CICESE 9 Francis Bretherton
10 What about numerical simulation outputs? Issues are many, and difficult, e.g.: Volume (can be enormous) Quality (how is it measured and controlled?) Metadata (what should be included?) Costs (is it cheaper to re-compute?) Needs (longitudinal studies, vs. punctual studies) Requirements for data assimilation Examples: weather prediction, climate simulations, earthquake simulations, earthquake prediction algorithms This calls for a broad discussion AMLIGHT, Big Data, Big Network, CICESE 10
11 Minimalist Metadata (automatic capture) Code version HW platform (e.g. CPU, GPU, word length, etc) SW Platform (e.g compiler, options) Input and runtime options (workflow?) Other (Author, etc, Dublin core) Even then, output might not be duplicated in future rerun. Many numerical outputs become obsolete. AMLIGHT, Big Data, Big Network, CICESE 11
12 Example TeraShake Simulation (2004) AMLIGHT, Big Data, Big Network, CICESE 12
13 Example M8 Simulation (2010) AMLIGHT, Big Data, Big Network, CICESE 13
14 TeraShake vs. M8 comparison Terashake M8 Notes Dimensions 600x300x80 km 810x405x85 km # cells Time step sec sec. # steps (Duration) 20, sec. 160, sec. # cores 240 (Datastar) 223,074 (CPU) 16,600 (GPU) Wall clock 5 days 24 hours (CPU) * 5 hours (GPU) ** Checkpoints Every 1,000 th step Every 20,000 th step * 220 Tflop/s ** 2.3 Pflop/s Checkpoints, each 150 Gbytes 32 Tbytes Cannot transfer Checkpoints, total 3 Tbytes 192 Tbytes * * Every 4 hrs AMLIGHT, Big Data, Big Network, CICESE 14
15 TeraShake vs. M8 comparison Surface Velocity vector field Total volume velocity field, all nodes, all steps Volume velocity field, decimated Terashake M8 Notes All nodes, every step: 1.1 TB Every other node, every 20 th step: 4.4 TB (out of 352 TB) 432 Tbytes 384 Pbytes All nodes, every 10 th step: 45 Tbytes ** Every other node, Every 20 th step 4,8 Pbytes Resolution OK for visualization **No longer usefully readable, because of tape read errors Typical Viz. movie <100 Gbytes < 100 Gbytes Interactive Viz. possible AMLIGHT, Big Data, Big Network, CICESE 15
16 So what to save? Possible strategy: Only save enough to allow interactive (user or purpose-specific) visualization, and use checkpoints to restart partial calculation. This works for punctual simulations (e.g. 1-day weather, single earthquake). AMLIGHT permits that. Save selected individual visualizations that characterize the run (small size data sets). AMLIGHT makes it easy. For long-term longitudinal research, such as climate research or earthquake prediction algorithms, some output may require long-term curation by a trusted repository This must be discussed on a case-by-case basis. AMLIGHT makes the data repository look proximal. AMLIGHT, Big Data, Big Network, CICESE 16
17 TeraShake Visualization Emmett MQuinn, Amit Chourasia GlyphSea-720p-cbr6.mp4 AMLIGHT, Big Data, Big Network, CICESE 17
18 M8 Visualization MachCone-1600m stepintervalGlyphSea_1280.mov AMLIGHT, Big Data, Big Network, CICESE 18
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
International Data Sharing Framework
International Data Sharing Framework Including ICSU World Data System Dr. Yasuhiro Murayama ICSU-WDS Scientific Committee ex officio Member of Cabinet Office Expert Panel of Open Science Associate member,
NASA's Strategy and Activities in Server Side Analytics
NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory
A standards-based open source processing chain for ocean modeling in the GEOSS Architecture Implementation Pilot Phase 8 (AIP-8)
NATO Science & Technology Organization Centre for Maritime Research and Experimentation (STO-CMRE) Viale San Bartolomeo, 400 19126 La Spezia, Italy A standards-based open source processing chain for ocean
NASA s Big Data Challenges in Climate Science
NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Optimizing IT Deployment Issues
Optimizing IT Deployment Issues Trends and Challenges for Engineering Simulation Barbara Hutchings [email protected] 1 Outline Deployment Challenges and Trends Extreme scale up and scale out
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM
DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction The Institute of Museum and Library Services (IMLS) is committed to expanding public access to federally funded research, data, software,
DSA-WDS collaboration. Françoise Genova Vice-Chair of WDS Scientific Commitee Thanksto the WDS IPO and to Mary Vardigan
DSA-WDS collaboration Françoise Genova Vice-Chair of WDS Scientific Commitee Thanksto the WDS IPO and to Mary Vardigan ICSU World Data System ICSU s long-term vision is of a world where excellence in science
DATA STEWARDSHIP from a geoscience and academic perspective
DATA STEWARDSHIP from a geoscience and academic perspective Margaret Leinen Vice Chancellor for Marine Science, UC San Diego Director, Scripps Institution of Oceanography Research Data Alliance - 5 San
Analysis of Climatic and Environmental Changes Using CLEARS Web-GIS Information-Computational System: Siberia Case Study
Analysis of Climatic and Environmental Changes Using CLEARS Web-GIS Information-Computational System: Siberia Case Study A G Titov 1,2, E P Gordov 1,2, I G Okladnikov 1,2, T M Shulgina 1 1 Institute of
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Big Data Services at DKRZ
Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
LJMU Research Data Policy: information and guidance
LJMU Research Data Policy: information and guidance Prof. Director of Research April 2013 Aims This document outlines the University policy and provides advice on the treatment, storage and sharing of
CIP s Open Data & Data Management Guidelines and Procedures
CIP s Open Data & Data Management Guidelines and Procedures 1.1 Scope The CIP Data Management Guidelines and Procedures aim to provide guidance and support throughout the Data Management Cycle to facilitate
Interactive Data Visualization with Focus on Climate Research
Interactive Data Visualization with Focus on Climate Research Michael Böttinger German Climate Computing Center (DKRZ) 1 Agenda Visualization in HPC Environments Climate System, Climate Models and Climate
An Introduction to Managing Research Data
An Introduction to Managing Research Data Author University of Bristol Research Data Service Date 1 August 2013 Version 3 Notes URI IPR data.bris.ac.uk Copyright 2013 University of Bristol Within the Research
SURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
CIESIN Columbia University
Conference on Climate Change and Official Statistics Oslo, Norway, 14-16 April 2008 The Role of Spatial Data Infrastructure in Integrating Climate Change Information with a Focus on Monitoring Observed
Assessing a Scientific Data Center as a Trustworthy Digital Repository
Assessing a Scientific Data Center as a Trustworthy Digital Repository Robert R. Downs 1 and Robert S. Chen 2 1 [email protected] 2 [email protected] NASA Socioeconomic Data and Applications
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect [email protected]
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect [email protected] EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
Long Term Preservation of Earth Observation Space Data. Preservation Workflow
Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow
Geospatial Data Archiving
Library of Congress National Digital Stewardship Alliance Geospatial Content Work Group http://www.digitalpreservation.gov/ndsa/working_groups/content.html Geospatial Data Archiving Quick Reference for
ICSU and the Challenge of Big Data in Science
ICSU and the Challenges of Big Data in Science Elsevier Conference on Big Data, E-Science and Science Policy 16 17 May 2012 Canberra Professor Ray Harris UCL International Council for Science ICSU 121
IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation
IBM Solution Framework for Lifecycle Management of Research Data Aspects of Lifecycle Management Research Utilization of research paper Usage history Metadata enrichment Usage Pattern / Citation Collaboration
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace
Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University [email protected] LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August
IODE Quality Management Framework for National Oceanographic Data Centres
IOC/IODE-XXII/22 Page 1 INTERGOVERNMENTAL OCEANOGRAPHIC COMMISSION (of UNESCO) Twenty-second Session of the IOC Committee on International Oceanographic Data and Information Exchange (IODE-XXII) Ensenada,
USGS Guidelines for the Preservation of Digital Scientific Data
USGS Guidelines for the Preservation of Digital Scientific Data Introduction This document provides guidelines for use by USGS scientists, management, and IT staff in technical evaluation of systems for
Nevada NSF EPSCoR Track 1 Data Management Plan
Nevada NSF EPSCoR Track 1 Data Management Plan August 1, 2011 INTRODUCTION Our data management plan is driven by the overall project goals and aims to ensure that the following are achieved: Assure that
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition
Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition Michael A. Schuh1, Rafal A. Angryk2 1 Montana State University, Bozeman, MT 2 Georgia State University, Atlanta, GA Introduction
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Pacific Catastrophe Risk Assessment and Financing Initiative. Better Information for Smarter Investments
Pacific Catastrophe Risk Assessment and Financing Initiative Better Information for Smarter Investments Main Outputs Pacific disaster risk assessment Probabilistic assessment of major perils Pacific Risk
Overview Motivation MapReduce/Hadoop in a nutshell Experimental cluster hardware example Application areas at the Austrian National Library
Overview Motivation MapReduce/Hadoop in a nutshell Experimental cluster hardware example Application areas at the Austrian National Library Web Archiving Austrian Books Online SCAPE at the Austrian National
Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques
Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques Robert Rohde Lead Scientist, Berkeley Earth Surface Temperature 1/15/2013 Abstract This document will provide a simple illustration
RISKSCAPE TUTORIAL 4: 200 YEAR ANNUAL RETURN INTERVAL (ARI) HEATHCOTE RIVER FLOOD EVENT: MITIGATING IMPACTS ON CHRISTCHURCH BUILDINGS
RISKSCAPE TUTORIAL 4: 200 YEAR ANNUAL RETURN INTERVAL (ARI) HEATHCOTE RIVER FLOOD EVENT: MITIGATING IMPACTS ON CHRISTCHURCH BUILDINGS Welcome to the RiskScape tutorial: 200 Year ARI Heathcote River Flood
Data Isn't Everything
June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,
How To Write An Nccwsc/Csc Data Management Plan
Guidance and Requirements for NCCWSC/CSC Plans (Required for NCCWSC and CSC Proposals and Funded Projects) Prepared by the CSC/NCCWSC Working Group Emily Fort, Data and IT Manager for the National Climate
GLOBAL DATA SPATIALLY INTERRELATE SYSTEM FOR SCIENTIFIC BIG DATA SPATIAL-SEAMLESS SHARING
GLOBAL DATA SPATIALLY INTERRELATE SYSTEM FOR SCIENTIFIC BIG DATA SPATIAL-SEAMLESS SHARING Jieqing Yu a, Lixin WU b, a, c*, Yizhou Yang c, Xie Lei d, Wang He d a School of Environment Science and Spatial
Flood Modelling for Cities using Cloud Computing FINAL REPORT. Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby
Summary Flood Modelling for Cities using Cloud Computing FINAL REPORT Vassilis Glenis, Vedrana Kutija, Stephen McGough, Simon Woodman, Chris Kilsby Assessment of pluvial flood risk is particularly difficult
New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER)
New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER) William C. Block and Lars Vilhuber 4 th Workshop on Data
Cloud Computing @ JPL Science Data Systems
Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011 Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1:
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
Image Data, RDA and Practical Policies
Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab
Conquering the Astronomical Data Flood through Machine
Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:
walberla: A software framework for CFD applications on 300.000 Compute Cores
walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, [email protected]), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation)
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Recent activities on Big Data Assimilation in Japan
August 17, 2014, WWOSC, Montreal, Canada Recent activities on Big Data Assimilation in Japan M. Kunii, J. Ruiz, K. Kondo, and Takemasa Miyoshi* RIKEN Advanced Institute for Computational Science *PI and
Guidelines for Pilot Testing of Data Management Maturity sm Model for Individual Data Matching
Final Report Patient Matching Community of Practice Guidelines for Pilot Testing of Data Management Maturity sm Model for Individual Data Matching Submitted to Office of the National Coordinator for Health
The THREDDS Data Repository: for Long Term Data Storage and Access
8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics
Big Data Research at DKRZ
Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Workshop on the Future of Big Data Management 27-28 June 2013 Philip Kershaw Centre for Environmental Data Archival
SGI Big Data Ecosystem - Overview. Dave Raddatz, Big Data Solutions & Performance, SGI
SGI Big Data Ecosystem - Overview Dave Raddatz, Big Data Solutions & Performance, SGI Big Data Benchmarking Community March 21, 2013 Speed and Scale Finding Answers 0 Subatomic Atomic to Cellular Human
Data quality Vision at SBBr Danny Vélez
Data quality Vision at SBBr Danny Vélez 4th workshop SiBBr: data quality and ecological data 25-29 August 2014 SiBBr: national level Community Universities NGO s Government agencies Research centers Citizens
WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE
WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE Bin Ye, College of Agricultural and Life Sciences University of Wisconsin Diane Winter, Inter-university Consortium for Political and Social Research (ICPSR),
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
