The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data

Similar documents
VisIt: A Tool for Visualizing and Analyzing Very Large Data. Hank Childs, Lawrence Berkeley National Laboratory December 13, 2010

Exploratory Climate Data Visualization and Analysis

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.

Parallel Analysis and Visualization on Cray Compute Node Linux

SciDAC Visualization and Analytics Center for Enabling Technology

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): Data Analysis and Visualization for Geoscience Data

Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) Final Report

NASA's Strategy and Activities in Server Side Analytics

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Modernizing Simulation Input Generation and Post-Simulation Data Visualization with Eclipse ICE

Hank Childs, University of Oregon

VisIt: An End-User Tool For Visualizing and Analyzing Very Large Data

Provenance for Visualizations

Using Python in Climate and Meteorology

Parallel Large-Scale Visualization

VisIt Visualization Tool

Visualization of Adaptive Mesh Refinement Data with VisIt

HPC & Visualization. Visualization and High-Performance Computing

Data-Intensive Science and Scientific Data Infrastructure

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Briefing for NFAC February 5, Sharlene Weatherwax Associate Director Office of Biological and Environmental Research U.S. Department of Energy

Part I Courses Syllabus

Visualization and Data Analysis

The Flexible Climate Data Analysis Tools (CDAT) for Multimodel Climate Simulation Data

Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content

Databricks. A Primer

Introduction to Visualization with VTK and ParaView

Make the Most of Big Data to Drive Innovation Through Reseach

Processing Data with rsmap3d Software Services Group Advanced Photon Source Argonne National Laboratory

Seamless Web Data Entry for SAS Applications D.J. Penix, Pinnacle Solutions, Indianapolis, IN

IDL. Get the answers you need from your data. IDL

Address IT costs and streamline operations with IBM service request and asset management solutions.

Harnessing the power of advanced analytics with IBM Netezza

Solution White Paper Connect Hadoop to the Enterprise

Three Open Blueprints For Big Data Success

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

zen Platform technical white paper

Big Data Management in the Clouds and HPC Systems

Information Visualization WS 2013/14 11 Visual Analytics

Key Attributes for Analytics in an IBM i environment

Managing Large Imagery Databases via the Web

Executive summary. Table of Contents. Technical Paper Minimize program coding and reduce development time with Infor Mongoose

The ORIENTGATE data platform

ORACLE S PRIMAVERA FEATURES PORTFOLIO MANAGEMENT. Delivers value through a strategy-first approach to selecting the optimum set of investments

Some programming experience in a high-level structured programming language is recommended.

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Databricks. A Primer

3rd Annual Earth System Grid Federation and Ultrascale Visualization Climate Data Analysis Tools Face-to-Face Meeting Report December 2013

Easier - Faster - Better

Luncheon Webinar Series May 13, 2013

LLNL Redefines High Performance Computing with Fusion Powered I/O

Big Workflow: More than Just Intelligent Workload Management for Big Data

Web Based Visualization Tool for Climate Data Using Python. Hannah Aizenman and Michael Grossberg. David Jones and Nick Barnes

Visualization with ParaView. Greg Johnson

Virtualization s Evolution

An Interactive 3D Visualization Tool for Large Scale Data Sets for Quantitative Atom Probe Tomography

HPC Wales Skills Academy Course Catalogue 2015

ICT Perspectives on Big Data: Well Sorted Materials

NA-ASC-128R-14-VOL.2-PN

Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

NASA s Big Data Challenges in Climate Science

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Integrated Open-Source Geophysical Processing and Visualization

The Scientific Data Mining Process

Visualization with ParaView

CHAPTER FIVE RESULT ANALYSIS

Microsoft Dynamics AX 2012 A New Generation in ERP

Fast Multipole Method for particle interactions: an open source parallel library component

An Integrated Simulation and Visualization Framework for Tracking Cyclone Aila

COMOS Platform. Worldwide data exchange for effective plant management.

Integrated Information Management System, Development of Web Interface, a.k.a. Online Data Portal (ODP)

James Ahrens, Berk Geveci, Charles Law. Technical Report

Post-processing and Visualization with Open-Source Tools. Journée Scientifique Centre Image April 9, Julien Jomier

How to Design and Create Your Own Custom Ext Rep

Large-Data Software Defined Visualization on CPUs

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

MEng, BSc Applied Computer Science

Data Deduplication in a Hybrid Architecture for Improving Write Performance

Scientific Visualization with Open Source Tools. HM 2014 Julien Jomier

Transcription:

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data Lawrence Livermore National Laboratory? Hank Childs (LBNL) and Charles Doutriaux (LLNL) September 22, 2011 On behalf of the UV-CDAT and Data Explorer Teams 10-05 team responsible for developing and deploying UV-CDAT 10-05 team responsible for developing and deploying large data climate analysis

UV-CDAT UV-CDAT = Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Goal: robust tool, capable of doing powerful analysis on large climate data sets. Collaboration between two 10-05 teams One tool that incorporates many packages including VisIt Focal point of Data Explorer team

VisIt is an open source, richly featured, turn-key application for large data. Used by: Visualization experts Simulation code developers Simulation code consumers Popular R&D 100 award in 2005 Used on many of the Top500 >>>100K downloads Developed by: NNSA, SciDAC, NEAMS, NSF, and more 217 pin reactor cooling simulation Run on ¼ of Argonne BG/P Image credit: Paul Fischer, ANL

VisIt recently demonstrated good performance at unprecedented scale. Weak scaling study: ~62.5M cells/core Machine Franklin Dawn JaguarPF Juno Purple Ranger Model Cray XT4 IBM P5 Sun Problem Size 1T, 2T #cores 16K, 32K VisIt s BG/P data processing 4T 64K techniques are more Cray than XT5 scalability 2T at massive 32K concurrency; X86_64 we are leveraging 1T a 16K suite of techniques developed over the last decade by VACET, 0.5T 8K the 1T NNSA, and 16K more. Two trillion cell data set, rendered in VisIt by David Pugmire on ORNL Jaguar 4

Production visualization tools use pure parallelism to process data. Parallel Simulation Code P1 P2 P4 P0 P3 P8 P7 P6 P5 P9 P0 P1 P2 P3 Parallelized visualization data flow network Read Process Render This is a good approach for high resolution meshes but climate data is different Processor 0 Read Process Render Processor 1 P4 P8 P5 P9 P6 P7 Pieces of data (on disk) Read Process Render Processor 2

Parallelizing Processing Over Time Slices: Improving Performance For Climate Data Objective VisIt s parallel processing techniques were designed for single time slices of very high resolution meshes. We must adapt this approach for the lower resolution and high temporal frequency characteristic of climate data. Progress & Results We have modified VisIt s underlying infrastructure to have a parallelize over time processing mode. We implemented a simple algorithm ( maximum value over time ) as a proof of concept Concurrency Averag e Time P0 P1 P2 VisIt parallelizing over a high resolution spatial mesh Speedup 1 480.0s - 2 223.0s 2.1X 4 120.0s 4.0X 8 62.0s 7.8X 16 29.0s 16.5X 32 15.5s 31.0X 64 7.8s 61.5X 128 4.2s 114X Scaling on 2130 time slices of NetCDF climate data (source: Wehner) Impact T=0 T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8 P0 P1 P2 VisIt parallelizing temporally over a low resolution mesh Climate scientists with access to parallel resources for their data will be able to process their data significantly faster through parallelization. This software investment will enable other algorithms developed by the project to also be accelerated through parallelization.

VisIt is used to look at simulated and experimental data from many areas. Fusion, Sanderson, UUtah Astrophysics, Childs Nuclear Reactors, Childs Particle accelerators, Ruebel, LBNL

General-purpose tools vs application-specific tools General-purpose tools: Are developed by large teams, leading to: Robustness Efficient algorithms Rich set of features But: They aren t streamlined for a given application area (lots of button clicks) They don t have application-specific methods So which do users want? Application-specific tools: Are made specifically to solve your problem: Streamlined user interface Application-specific analysis But, they have smaller user and developer communities, so they are: Less robust Less efficient algorithms Smaller set of features

Amazing developments over the last decade Very useful packages now available that make quick tool development possible: Python, R, VTK, Qt But this doesn t solve the large data issue. but tools now are available that do that as well: VisIt & ParaView VisTrails Great idea: put all these products together into one tool. Users get: The robustness, richness, and efficiency of large development efforts Streamlined user interface & climate-specific analysis (via CDAT) This tool is UV-CDAT.

CDAT Designed for climate science data, CDAT was first released in 1997 Based on the object-oriented Python computer language Added Packages that are useful to the climate community and other geophysical sciences Climate Data Management System (CDMS) NumPy / Masked Array / Metadata Visualization (VCS, IaGraphics, Xmgrace, Matplotlib, VTK, Visus, etc.) Graphical User Interface (VCDAT) XML representation (CDML/NcML) for data sets One environment from start to finish Integrated with other packages (i.e., LAS, OPeNDAP, ESG, etc.) Community Software (BSD open source license) URL: http://www-pcmdi.llnl.gov/software-portal (CDAT Plone site)

ParaView: vtdv3d, Scientific Data Analysis and Visualization High level analysis and visualization modules for UV- CDAT workflows. Encapsulate complex data visualization and processing operations. At a level of complexity appropriate for scientists. Interactive configuration enabling data exploration (with provenance). 11

VisTrails VisTrails is an open-source scientific workflow and provenance management system developed at the University of Utah that provides support for data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, such as simulations, data analysis and visualization, very little is repeated---change is the norm. As an engineer or scientist generates and evaluates hypotheses about data under study, a series of different, albeit related, workflows are created while a workflow is adjusted in an interactive process. VisTrails was designed to manage these rapidly-evolving workflows. A key distinguishing feature of VisTrails is a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the workflows that derive these products and their executions. This information is persisted as XML files or in a relational database, and it allows users to navigate workflow versions in an intuitive way, to undo changes but not lose any results, to visually compare different workflows and their results, and to examine the actions that led to a result. It also enables a series operations and user interfaces that simplify workflow design and use, including the ability to create and refine workflows by analogy and to query workflows by example.

UV-CDAT Architecture Layers VisTrails, CDAT, Python are key elements to integrating these packages. Presentation by Claudio Silva @ 11:30. 13

Integrated UV-CDAT GUI: Project, Plot, and Variables View

Lots of work to do for Data Explorer effort Lots of software engineering to get pieces working together: Integration of VisIt and R Improved integration of VisIt and VisTrails Connect VisIt to UV-CDAT (Upgrade to VTK 5.6) Incorporate analysis work described by Wes (cyclone detection, atmospheric rivers)

Summary UV-CDAT: tool being developed by two 10-05 teams. Goal: robust tool, capable of doing powerful analysis on large climate data sets. Its development is significantly leveraged by many existing packages. The Data Explorer team is focusing on deploying VisIt and R as part of UV-CDAT. VisIt is excellent with large data and has been adapted to work with climate data. Our SWE effort is now to integrate with UV-CDAT We also are collaborating on cutting edge analysis techniques with climate scientists (Bethel talk)

Example climate visualizations with VisIt