Visualization and Data Analysis



Similar documents
Solvers, Algorithms and Libraries (SAL)

Exascale Computing Project (ECP) Update

Data Centric Systems (DCS)

Parallel Analysis and Visualization on Cray Compute Node Linux

Hank Childs, University of Oregon

NA-ASC-128R-14-VOL.2-PN

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

The Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System

HPC & Visualization. Visualization and High-Performance Computing

Post-processing and Visualization with Open-Source Tools. Journée Scientifique Centre Image April 9, Julien Jomier

HPC technology and future architecture

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Petascale Visualization: Approaches and Initial Results

Parallel Large-Scale Visualization

Towards Scalable Visualization Plugins for Data Staging Workflows

Performance Monitoring of Parallel Scientific Applications

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.

PACE Predictive Analytics Center of San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Energy Efficient MapReduce

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

The THREDDS Data Repository: for Long Term Data Storage and Access

Large-Data Software Defined Visualization on CPUs

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Part I Courses Syllabus

BSC vision on Big Data and extreme scale computing

Overlapping Data Transfer With Application Execution on Clusters

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Tableau Server Scalability Explained

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Big Data Management in the Clouds and HPC Systems

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)

Data Requirements from NERSC Requirements Reviews

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

7 things to ask when upgrading your ERP solution

HPC enabling of OpenFOAM R for CFD applications

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Data Centric Interactive Visualization of Very Large Data

Scientific Visualization with Open Source Tools. HM 2014 Julien Jomier

MEng, BSc Applied Computer Science

FY 2016 Laboratory Directed Research and Development (LDRD) Program

Big Data. George O. Strawn NITRD

Machine Learning for Data-Driven Discovery

Software Life-Cycle Management

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Deduplication in a Hybrid Architecture for Improving Write Performance

Cluster, Grid, Cloud Concepts

Xeon+FPGA Platform for the Data Center

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Data Semantics Aware Cloud for High Performance Analytics

In-Database Analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

OS/Run'me and Execu'on Time Produc'vity

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

UNCLASSIFIED. R-1 Program Element (Number/Name) PE A / High Performance Computing Modernization Program. Prior Years FY 2013 FY 2014 FY 2015

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Big Data Challenges in Bioinformatics

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

Building Platform as a Service for Scientific Applications

Relations with ISV and Open Source. Stephane Requena GENCI

Information Visualization WS 2013/14 11 Visual Analytics

Hadoop Cluster Applications

The Scientific Data Mining Process

Massive Cloud Auditing using Data Mining on Hadoop

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

DIY Parallel Data Analysis

International Workshop on Field Programmable Logic and Applications, FPL '99

CHAPTER 1 INTRODUCTION

Data Intensive Science and Computing

From GWS to MapReduce: Google s Cloud Technology in the Early Days

Understanding the Value of In-Memory in the IT Landscape

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Transcription:

Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881 1

Working Group Description The scope of our working group is scientific visualization and data analysis. Scientific visualization refers to the process of transforming scientific simulation and experimental data into images to facilitate visual understanding Data analysis refers to the process of transforming data into an information-rich form via mathematical or computational algorithms to promote better understanding Data management - shared with IONS refers to the process of tracking, organizing and enhancing the use of scientific data The purpose of our work is to enable scientific discovery and understanding. Our scope includes an exascale e software and hardware a infrastructure that effectively supports visualization and data analysis. 2

Identify current state-of-the-art Production visualization tools scalably process large data Suite of data-parallel visualization, analysis and rendering algorithms Open-source tools ParaView, Visit Commercial tool Ensight ASC success story NNSA/ASC created the large-data scientific visualization tools in use by other agencies (NSF, DOD) and around the world 3

Identify Exascale Visualization and Data Analysis Needs Visualization and Data Analysis (VDA) Broad range of scope for VDA VDA as an application VDA as a service VDA as a systems infrastructure Note: like apps, VDA capabilities will require development to exploit opportunities in evolving platforms 4

1. Exascale Challenges storing a full-range of results for later analysis becomes impossible due to technology trends The rate of performance improvement of rotating storage is not keeping pace with compute. Provisioning additional disks is a possible mitigation strategy, however, power, cost and reliability issues will become a significant issue. A new in-situ exascale visualization and data analysis approach is needed: Slow output to data hierarchy / Data movement = power/cost Where will we process data? Different customer-driven approaches require integration at different HW levels 5

2. Exascale Challenges - Exascale simulation results must be distilled with quantifiable data reduction techniques Exascale as massive data Defacto data reduction technique 1 Visualization algorithms 2 Rendering massive numbers of polygons This puts lots of data into a single pixel, combined by the renderer This is a workable method but is it what the user wants? This approach provides the foundation for our current successes brute force approach that requires significant computing resources difficult to quantify the bias of this approach Approaches that quantifiably reduce data as it is generated need to be explored 6

3. Exascale Challenges - New exascale-enabled physics approaches require corresponding new visualization and data analysis approaches Implication of exascale as massive compute Statistical physics approaches Statistical modeling of a physical process Parametric studies record how a simulation responds in a parameter space of possibilities Multi-physics approaches simulate a linked model of different related phenomena such as a linked physics and chemistry simulation Multi-scale l approaches simulate phenomena at different spatial and temporal scales Understanding and presenting both summarized and highlighted results from multiple sources is an important technical challenge that needs to be addressed. 7

4. Exascale Challenges - Visualization and data analysis approaches will need to run efficiently on exascale platform architectures Need to take advantage of a very high degree of parallelism Technical challenges include achieving portability, efficiency and integration flexibility with simulation codes 8

1. Path Forward - New visualization and data analysis software infrastructure When?: Run-time, Postprocessing How?: Interactive, Batch In-situ analysis within the simulation code Run-time, (batch or interactive) Post-processing --- advanced querybased approach Required Partnership IONS, tools, systems Apps for co-design Metric Our success will be measured by our readiness for applications as machine delivery milestones are met Revolutionary approach Phase 1 Risks Prototype t approaches in applications How to do discovery science Phase 2&3 in an in-situ world? Develop and deploy Won t find an effective Continue R&D analysis approach for exascale applications 9

2. Advanced quantifiable data reduction algorithms Data triage How do we significantly reduce the data as it is generated? Statistical sampling Compression Multi-resolution Science-based feature extraction Revolutionary approach Phase 1 Prototype t approaches independently and with applications Phase 2&3 Develop and deploy Continue R&D Required Partnership Applied math Apps for co-design Metric Measure of amount of data reduced and quality of result, time Risks Won t find an effective analysis approach for exascale applications 10

3. Visualization and data analysis techniques to help understand advanced exascale physics Visualization and Data Analysis for: Statistical physics approaches Parametric studies Multi-physics approaches Multi-scale approaches how results from different aspects of a simulation suite relate to each other Required Partnership Applications for co-design Metric Ties to appropriate application milestones Evolutionary/Revolutionary approach Phase 1 Prototype t approaches independently and with applications Phase 2&3 Develop and deploy Continue R&D Risks Won t understand output of exascale applications 11

4. Implement core visualization and dataanalysis capability using a scalable parallel infrastructure Our visualization and data analysis solutions need to work on the exascale supercomputers on both swim lanes. Evolutionary approach Phase 1 Prototype t approaches independently and with applications Phase 2&3 Develop and deploy Continue R&D Required Partnership Programming models, tools and applications groups Metric Our success will be measured by our readiness for applications as machine delivery milestones are met Risks Not running on the machines 12

5. Exascale visualization and data analysis hardware infrastructure Data-intensive hardware infrastructure for the exascale platform Memory buffers for staged analysis and storage Analysis-enabled storage Large node memory portion of the supercomputer Evolutionary/Revolutionary approach Phase 1 Prototype approaches independently and with applications Phase 2&3 Develop and deploy Continue R&D Required Partnership HW, Systems, I/O Metric Ties to appropriate machine milestones Risks HW platform that makes data analysis difficult 13

6. Tracking and using knowledge about the scientific goals makes visualization and data analysis more effective Provenance Where the data comes from? How was it generated? Uncertainty quantification Workflow management and beyond Monitoring, debugging Supercomputer p situational awareness Revolutionary approach Phase 1 Prototype approaches independently and with applications Phase 2&3 Develop and deploy Continue R&D Required Partnership Tools, Systems, Applications for co-design Metric Time to solution, cost, improved quality Risks Won t know where data came from Inefficient use of supercomputer 14

Recommended Co-Design Strategy Critical steps/activities Data Analysis and Movement are critical cross-cutting cutting issues Develop well-defined HW needs for VDA In-situ and interactive data extraction could benefit from co-design (e.g. resource management, time series analysis) Pilot co-design projects to help develop teams and define co-design processes These can be subsets VDA and IO working with a defined app Working with vendors Well-defined communication with partners and vendors Visualization software vendors (e.g. Kitware, CEI), platform vendors Path Forward investments 15

Recommended Co-Design Strategy Role of skeleton/compact apps VDA mini-apps Investigate coupling of in-situ capabilities with applications at scale Inform Architecture decisions by providing information on use patterns Concerns/suggestions Organization effects outcomes Partnering with Verification/Validation, applied math Overlap with other groups Influence we can have on HW vendors w/in timeframe Particularly with respect to data Co-design should have significant investment and sustained commitment 16

Big Picture Issues Coordination VDA will be more tightly coupled with apps than in the past Data movement and provenance are shared responsibilities Test beds Common test t beds crucial to creating Exascale community Specialized test beds may be necessary Simulators Requirement: include adequate characterization of data movement Unclear if there are requirements for VDA-only simulation components Requirement: system-level simulation, to model various VDA use cases Remaining gaps Interactive data analysis (e.g. querying and extraction) for Exascale is coupled with in-situ (what you can compute at runtime) and computation on data (what you want to understand after the simulation) 17

Conclusions A data-oriented t d perspective is critical to exascale success 18

End R&D Challenges for HPC Simulation Environments 19