Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Manish Parashar Rutgers Discovery Informatics Institute (RDI 2 ) NSF Center for Cloud and Autonomic Computing (CAC) http://parashar.rutgers.edu *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang and other ex-students and collaborators (ORNL, GT, UU, NCSU, SNL, LBNL, LLNL, Manish Parashar, Rutgers University Outline Data Grand Challenges in Simulation-base Science Rethinking the simulations -> insights pipeline The DataSpaces Project Conclusion 1

6/28/14 Modern Science & Society Transformed by Compute & Data New paradigms and prac/ces in science and engineering Data- driven, data and compute- intensive Inherently mul/- disciplinary Collabora/ve (university, na/onal, global) Many Challenges Computing, Data, Software, People Many Challenges. Computing Multicore; large and increasing core counts, deep memory hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW) New prgm. models, concerns (fault tolerance, energy, etc) New models & technologies: Clouds, grids, hybrid manycore, accelerators, deep storage hierarchies, Data Generating more data than in all of human history: preserve, mine, share? How do we create data scientists/engineers? Software Complex applications on coupled compute-data-networked environments, tools needed Modern apps: 106+ lines, many groups contribute, take decades to develop, very long lifetimes People Multidisciplinary expertise essential! Appropriate academic program, career tracks 4 2

6/28/14 The Era of escience and Big Data SKA project will generate 1 EB data/day in 2020! Climate Data 2004 -> 36TB 2012 -> 2PB Keep increasing! Exa Bytes Climate, Environment Square Kilometer Array Clearly, modern scientific network/ instruments/experiments/ are producing Big Data!! Bytes per day Genomics Peta Bytes Tera Bytes Giga Bytes Climate, Environment TeraGrid/ XSEDE, Blue LHC Waters But what about HPC? Genomics Genome sequencing are doubling in output every 9 months! LHC will produce roughly 15PB data/ year! Many smaller datasets 2012 LHC LSST 100PB data from LSST by the end of this decade!! 2020 Credit: R. Pennington/A. Blatecky O Scientific Discovery through Simulations (I) ACI providing increasing computational scales, capabilities and capacities Scientific simulations running on high-end computing systems generate huge amounts of data! If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EB per year Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to eﬃciently manage and explore extreme scale data: ﬁnd the needles in haystack?? 3

Scientific Discovery through Simulations (II) Complex workflows integrating coupled models, data management/ processing, analytics Tight / loose coupling, data driven, ensembles Advanced numerical methods (E.g., Adaptive Mesh Refinement) Integrated (online) uncertainty quantification, analytics Complex, heterogeneous components Large data volumes and data rates Complex couplings, data redistribution, data transformations Dynamic data exchange patterns Strict performance, energy constraints Traditional Simulation -> Insight Pipelines Break Down Figure. Traditional data analysis pipeline Traditional simulation -> insight pipeline: Run large-scale simulation workflows on large supercomputers Dump data on parallel disk systems Export data to archives Move data to users sites usually selected subsets Perform data manipulations and analysis on mid-size clusters Collect experimental / observational data Move to analysis sites Perform comparison of experimental/observational to validate simulation data 4

Challenges Faced by Traditional HPC Data Pipelines Scalable data analytics challenge Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine? I/O and storage challenge Increasing performance gap: disks are outpaced by computing speed Figure. Traditional data analysis pipeline Data movement challenge Lots of data movement between simulation and analysis, between coupled mutli-physics simulation components => longer latencies Improving data locality is critical: do work where the data resides! Energy challenge Future extreme systems are designed to have low-power chips however, much greater power consumption will be due to memory and data movement! The costs of data movement are increasing and dominating! The Cost of Data Movement Moving data between node memory and persistent storage is slow! The energy cost of moving data is a significant concern performance gap Energy _ move _ data = bitrate * length 2 cross_section_area_of_wire From K. Yelick, Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer " 5

Challenges Faced by Traditional HPC Data Pipelines Traditional data analysis pipeline The costs of data movement (power and performance) are increasing and dominating! We need to Rethink the Data Management Pipeline! Reduce data movement Move computation/analytics closer to the data Add value to simulation data along the IO path Rethinking the Data Management Pipeline Hybrid Staging + In-Situ & In-Transit Execution Issues/Challenges Programming abstractions/systems Mapping and scheduling Control and data flow Autonomic runtime 6

Design space of possible workflow architectures Loca%on of the compute resources Same cores as the simula/on (in situ) Some (dedicated) cores on the same nodes Some dedicated nodes on the same machine Dedicated nodes on an external resource Data access, placement, and persistence Direct access to simula/on data structures Shared memory access via hand- off / copy Shared memory access via non- vola/le near node storage (NVRAM) Data transfer to dedicated nodes or external resources Synchroniza%on and scheduling Execute synchronously with simula/on every n th simula/on /me step Execute asynchronously Analysis Tasks Simulation Visualization DRAM DRAM Node 2 Node 1 Sharing cores with the simulation Using distinct cores on same node CPUs DRAM NVRAM SSD Hard Disk... Network Node N DRAM DRAM Simulation Node Network Communication Staging Node Processing data on remote nodes CPUs DRAM Staging option 2 NVRAM Staging option 3 SSD Hard Disk Staging option 1 DataSpaces In-situ/In-transit Data Management & Analytics ADIOS/DataSpaces dataspaces.org! Virtual shared-space programming abstraction! Simple API for coordination, interaction and messaging! Distributed, associative, in-memory object store! Online data indexing, flexible querying! Adaptive cross-layer runtime management! Hybrid in-situ/in-transit execution! Efficient, high-throughput/low-latency asynchronous data transport 7

DataSpaces: A Scalable Shared Space Abstraction for Hybrid Data Staging [HPDC10, JCC12]! Virtual shared-space abstraction! Simple API for coordination, interaction and messaging! Provides a global-view programming abstraction consistent with PGAS! Distributed, associative, in-deepmemory object store! Online data indexing, flexible querying! Adaptive cross-layer runtime management! Hybrid in-situ/in-transit execution! Data-centric mappings! High-throughput/low-latency memoryto-memory asynchronous data transport Dynamic coordination and interaction patterns between the coupled applications Transparent data redistribution Complex geometry-based queries In-space (online) data transformation and manipulations DataSpaces Abstraction: Indexing + DHT Dynamically constructed structured overlay using hybrid staging cores Index constructed online using SFC mappings and applications attributes E.g., application domain, data, field values of interest, etc. DHT used to maintain metadata information E.g., geometric descriptors for the shared data, FastBit indices, etc. Data objects load-balanced separately across staging cores The SFC maps the global domain to a set of intervals Intervals are non-contiguous and can lead to load imbalance Second mapping compacts the intervals into a contiguous virtual interval Split the contiguous interval equally to the DataSpaces nodes 8

DataSpaces: Scalability on ORNL Titan Time(s) 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Data redistribution time 64K/4K 32K/2K 16K/1K 8192/512 4096/256 2048/128 1024/64 512/32 Throughput(GB/s) 120 110 100 90 80 70 60 50 40 30 20 10 0 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Aggregate data redistribution throughput 64K/4K 32K/2K 16K/1K 8192/512 4096/256 2048/128 1024/64 512/32 Weak scaling with increasing number of processors Applications redistribute data using DataSpaces Application 1 runs on M processors and inserts data into the space Application 2 runs on N processors and retrieves data from the space Result: A 128 fold increase in the application sizes from 512 to 64K writers, total data size exchanged per step is increased from 2GB to 256GB DataSpaces: Enabling Coupled Scientific Workflows at Extreme Scales Multphysics Code Coupling at Extreme Scales [CCGrid10] Data-centric Mappings for In-Situ Workflows [IPDPS12] PGAS Extensions for Code Coupling [CCPE13] data_kernels.o 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 gcc -c data_kernels.c kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } Link Applications.o Applications executable aprun Compute nodes return() Staging nodes aprun rexec() Load Runtime execution system (Rexec) data_kernels.lua for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end Dynamic Code Deployment In-Staging [IPDPS11] 9

6/28/14 Applications Enabled Combustion: S3D uses chemical model to simulate combustion process in order to understand the ignition characteristic of bio-fuels for automotive usage [SC 2012] Fusion: EPSi simulation workflow provides insight into edge plasma physics in magnetic fusion devices by simulating movement of millions particles [CCGrid 2010] Subsurface modeling: IPARS uses multi-physics, multi-model formulations and coupled multi-block grids to support oil reservoir simulation of realistic, highresolution reservoir studies with a million or more grids [CCGrid 2011] Computational mechanics: FEM coupled with AMR is often used to solve heat transfer or fluid dynamic problems. AMR only performs refinements on important region of the mesh [SC 2014 (in progress)] Material Science: SNS workflow enables integration of data, analysis, and modeling tools for materials science experiments to accelerate scientific discoveries which requires beam lines data query capability Material Science: QMCPack utilizes quantum Monte Carlo (QMC) studies in heterogeneous catalysis of transition metal nanoparticles, phase transitions, properties of materials under pressure, and strongly correlated materials Workflows beyond simulations Content-based image retrieval (CBIR) on digitized peripheral blood smear specimens using low-level morphological features (MICAII 2012, CAC 2013, MICAII 2013) Asynchronous Replica Exchange - Monitor the progress of replicas, using the secondary structure prediction methods (J. Comput. Chem. 2008) Control fluid flow in micro-devices by placing a sequence of pillar obstacles (MTAGS 2013, CISE 2013) Monte Carlo numerical strategies to unravel how nucleosomes shape DNA into chromatin (J. Biophysical 2014) 10

In-Situ Feature Extraction and Tracking using Decentralized Online Clustering (DISC 12, ICAC 10) DOC workers executed in- situ on simula%on machines DOC Overlay Simula/on Compute Nodes Processor core runs simula/on Processor core runs DOC worker Benefits of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest) (2) Scientists can do real-time monitoring of the running simulations One compute node In-situ viz. and monitoring with staging" Pixie3D 1024 cores Pixplot 8 cores ParaView Server 4 cores record.bp record.bp pixie3d.bp pixie3d.bp DataSpaces Pixmon 1 core (login node) 11

AMR-as-a-Service using DataSpaces FEM (TalyFEM) Grid GridField pgrid pgridfield DataSpaces Grid GridField pgrid pgridfield AMR Finite Element Model (FEM): Models using uniform 3D meshes for nearrealistic engineering problems such as heattransfer, fluid flow and phase transformation AMR Service: Refines localized areas of interest, e.g., with high local error DataSpaces-as-a-Service: Persistent staging servers run as a service on HEC platform Applications can connect, exchange data objects, and disconnect independently Overview of Recent DataSpaces Research Programming abstractions / system DataSpaces: Interaction, coordination and messaging abstractions for coupled scientific workflow [HPDC10, CCGrid10, HiPC12, JCC12] XpressSpaces: PGAS extensions for coupling using DataSpaces [CCGrid11, CCPE13] ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11] Runtime mechanisms Data-centric Task Mapping: Reduce data movement and increase intra-node data sharing [IPDPS12, DISC12] In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC12, DISC12] Cross-layer Adaptation: Adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow [SC13] Value-based Data Indexing and Querying: Use FastBit to build in-situ, in-memory valuebased indexing and query support in the staging area [In review] Power/performance Tradeoffs: Characterizing power/performance tradeoffs for dataintensive simulations workflows [SC13] Data Staging over Deep Memory Hierarchy: Build distributed associative object store over hierarchical memory storage, e.g., DRAM/NVRAM/SSD [HiPC13] High-throughput/low-latency asynchronous data transport DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08] 12

6/28/14 Integrating In-Situ and In-Transit Analytics (SC 12) S3D: First-principles direct numerical simulation Simulation resolves features on the order of 10 simulation time steps Currently on the order of every 400th time step can be written to disk Temporal fidelity is compromised when analysis is done as a postprocess Recent data sets generated by S3D, developed at the Combus/on Research Facility, Sandia Na/onal Laboratories In-situ Topological Analysis as Part of S3D* Sta/s/cs Volume Rendering Topology Identify features of interest *J. C. Bennett et al., Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientiﬁc Analysis, SC 12, Salt Lake City, Utah, November, 2012. 33 13

Integrating In-Situ and In-Transit Analytics (SC 12) Primary resources execute the main simulation and in situ computations Secondary resources provide a staging area whose cores act as containers for in transit computations 4896 cores total (4480 simula/on/in situ; 256 in transit; 160 task scheduling/data movement) Simula/on size: 1600x1372x430 All measurements are per simula/on /me step Simulation case study with S3D: Timing results for 4896 cores and analysis every 10 th simulation time step S3D& in>situ& data&movement& in>transit&& in>situ&sta1s1cs& hybrid&sta1s1cs& in>situ&visualiza1on& hybrid&visualiza1on& 1.0% 1.0%.43%.004% hybrid&topology& 119.81& 1.61% simula1on& 168.5& 0& 20& 40& 60& 80& 100& 120& 140& 160& seconds& 14

Simulation case study with S3D: Timing results for 4896 cores and analysis every 100 th simulation time step S3D% in<situ% data%movement% in<transit%% in<situ%sta/s/cs% hybrid%sta/s/cs% in<situ%visualiza/on% hybrid%visualiza/on% hybrid%topology%.1%.1%.04%.004%.16% simula/on% 1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% seconds% Challenges Opportunities Ahead: The Vision of the Exascale Computing Initiative Achieve order 10 18 operations per second and order 10 18 bytes of storage Address the next generation of scientific, engineering, and large data problems Enable extreme scale computing: 1,000X capabilities of today's computers with a similar size and power footprint 20 pj per average operation => ~ x40 improvement over today's efficiency Billion-way concurrency Productive system based on marketable technologies Ecosystem for application development and execution, system management Deployed in early 2020s 15

Challenges Opportunities Ahead: Validation Workflows First Principles Fusion Code XGC, core- edge turbulence with UQ Ensemble of Models with UQ Model Model Model Model Knowledge Database Synthe/c Diagnos/c Use first principle calcula/ons on HPC to generate more accurate models Visual Analy/cs KSTAR Tokamak diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c diagnos/c Analysis Compara/ve Analy/cs Visualiza/on Experiment mock up Automate the process of these workflows and move work/data to sa/sfy metrics specified by users, and track provenance Generate a DB of knowledge by codes, to use for predic/ve capability during/aker each experiment Challenges Opportunities Ahead: Pervasive Computational Ecosystems 16

Summary & Conclusions Complex applications running on high-end systems generate extreme amounts of data that must be managed and analyzed to get insights Data costs (performance, latency, energy) are quickly dominating Traditional data management/analytics pipelines are breaking down Hybrid data staging, in-situ workflow execution, etc. can address this challenges Users to efficiently intertwine applications, libraries, middleware for complex analytics Many challenges; Programming, mapping and scheduling, control and data flow, autonomic runtime management. The DataSpaces project explores solutions at various levels However, many challenges opportunities ahead!! Thank You!! EPSi Edge Physics Simulation Manish Parashar, Ph.D. Rutgers Discovery Informatics Institute (RDI 2 ) Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey Email: parashar@rutgers.edu WWW: rdi2.rutgers.edu WWW: dataspaces.org 17