Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI 2 ) Department of Computer Science *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang, ADIOS Team, and other ex-students and collaborators. Manish Parashar, Rutgers University
Outline Data Grand Challenges Data challenges of simulation-based science Rethinking the simulations -> insights pipeline The DataSpaces Project Conclusion
RDI2 Data-driven Discovery in Science Nearly every field of discovery is transitioning from data poor to data rich Oceanography: OOI Astronomy: LSST Physics: LHC Biology: Sequencing Sociology: The Web Neuroscience: EEG, fmri Economics: POS terminals Credit: Ed Lazowska, Univ. of WA
Scientific Discovery through Simulations Scientific simulations running on high-end computing systems generate huge amounts of data! If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EB per year Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to efficiently manage and explore extreme scale data: find the needles in haystack??
Scientific Discovery through Simulations Complex workflows integrating coupled models, data management/ processing, analytics Tight / loose coupling, data driven, ensembles Advanced numerical methods (E.g., Adaptive Mesh Refinement) Integrated (online) analytics, uncertainty quantification, Complex, heterogeneous components Large data volumes and data rates Data re-distribution (MxNxP), data transformations Dynamic data exchange patterns Strict performance/overhead constraints
Traditional Simulation -> Insight Pipelines Break Down Simulation Machines Simulation Raw Data Storage Servers Analysis/Visualization Cluster Traditional simulation -> insight pipeline Run large-scale simulation workflows on large supercomputers Dump data to parallel disk systems Export data to archives Move data to users sites usually selected subsets Figure. Traditional data analysis pipeline Perform data manipulations and analysis on mid-size clusters Collect experimental / observational data Move to analysis sites Perform comparison of experimental/observational to validate simulation data
Challenges Faced by Traditional HPC Data Pipelines Data analysis challenge Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine? I/O and storage challenge Increasing performance gap: disks are outpaced by computing speed Data movement challenge Figure. Traditional data analysis pipeline Lots of data movement between simulation and analysis machines, between coupled multi-physics simulation components -> longer latencies Improving data locality is critical: do work where the data resides! Energy challenge Future extreme systems are designed to have low-power chips however, much greater power consumption will be due to memory and data movement! The costs of data movement are increasing and dominating!
The Cost of Data Movement Moving data between node memory and persistent storage is slow! performance gap The energy cost of moving data is a significant concern Energy _ move _ data = bitrate * length 2 cross_section_area_of_wire From K. Yelick, Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer
Rethinking the Data Management Pipeline Hybrid Staging + In-Situ & In-Transit Execution Issues/Challenges Programming abstractions/systems Mapping and scheduling Control and data flow Autonomic runtime Link with the external workflow Systems Glean, Darshan, FlexPath,..
Design space of possible workflow architectures Loca%on of the compute resources Same cores as the simulahon (in situ) Some (dedicated) cores on the same nodes Some dedicated nodes on the same machine Dedicated nodes on an external resource Data access, placement, and persistence Direct access to simulahon data structures Shared memory access via hand- off / copy Shared memory access via non- volahle near node storage (NVRAM) Data transfer to dedicated nodes or external resources Synchroniza%on and scheduling Execute synchronously with simulahon every n th simulahon Hme step Execute asynchronously Analysis Tasks Simulation Visualization DRAM DRAM Node 2 Node 1 Sharing cores with the simulation Using distinct cores on same node CPUs DRAM NVRAM SSD Hard Disk... Network Node N DRAM DRAM Simulation Node Network Communication Staging Node Processing data on remote nodes CPUs DRAM NVRAM Staging option 3 SSD Hard Disk Staging option 1 Staging option 2
l l l l DataSpaces: A Scalable Shared Space Abstraction for Hybrid Data Staging [JCC12] Shared-space programming abstraction over hybrid staging l l l l Simple API for coordination, interaction and messaging Provides a global-view programming abstraction Distributed, associative, in-deepmemory object store Online data indexing, flexible querying Exposed as a persistent service Autonomic cross-layer runtime management High-throughput/low-latency asynchronous data transport 0 Throughput(GB/s) Simula on get() App2 120 110 100 90 80 70 60 50 40 30 20 10 put()/pub() m m m m d d d d d put() Shared Space Model 64K/4K 32K/2K 16K/1K 8192/512 4096/256 2048/128 1024/64 512/32 d: Data object m: Meta-data object App1 get()/sub() get() App3 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Aggregate data redistribution throughput DataSpaces scalability on ORNL Titan
DataSpaces Abstraction: Indexing + DHT [HPDC10] Dynamically constructed structured overlay using hybrid staging cores Index constructed online using SFC mappings and applications attributes E.g., application domain, data, field values of interest, etc. DHT used to maintain meta-data information E.g., geometric descriptors for the shared data, FastBit indices, etc. Data objects load-balanced separately across staging cores
DataSpaces: Enabling Coupled Scientific Workflows at Extreme Scales Multphysics Code Coupling at Extreme Scales [CCGrid10] PGAS Extensions for Code Coupling [CCPE13] compute nodes running simulation and in-situ analytics Pool of computation bucket in the staging area Simulation Simulation Simulation Simulation in-situ analytics DART in-situ analytics in-situ analytics DART in-situ analytics DART DART Data in-transit analytics in-transit analytics DART in-transit analytics DART in-transit analytics DART DART... "bucket-ready" event "data-ready" event Assigned task & data descriptors Resoure scheduler Data Interaction and coordnation DataSpaces data_kernels.o 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 data_kernels.c kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } Applications.o Applications executable Data-centric Mappings for In-Situ Workflows [IPDPS12] Dynamic Code Deployment In-Staging [IPDPS11] gcc -c Link aprun Compute nodes return() Staging nodes aprun rexec() Load Runtime execution system (Rexec) data_kernels.lua for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end
Integrating In-situ and In-transit Analytics [SC 12] S3D: First-principles direct numerical simulation Simulation resolves features on the order of 10 simulation time steps Currently on the order of every 400 th time step can be written to disk Temporal fidelity is compromised when analysis is done as a post-process Recent data sets generated by S3D, developed at the CombusHon Research Facility, Sandia NaHonal Laboratories
Integrating In-situ and In-transit Analytics Design Couple concurrent data analysis with simulation StaHsHcs Volume Rendering Topology Identify features of interest *J. C. Bennett et al., Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientific Analysis, SC 12, Salt Lake City, Utah, November, 2012. 23
In-situ/In-transit Data Analytics Online data analytics for large-scale simulation workflows Combine both in-situ and in-transit placement to reduce impact on simulation, reduce network data movement, reduce total execution time Adaptive runtime management: Optimize the performance of simulation-analysis workflow through adaptive data and analysis placement, data-centric task mapping primary resources Simulation asynchronous in-situ to in-transit data transfers secondary resources In-transit Task Executors shared data structures In-situ processing data ready Pull task & data descriptors Workflow Manager Figure. Overview of the in-situ/in-transit data analysis framework Figure. System architecture for cross-layer runtime adaptation
Simulation case study with S3D: Timing results for 4896 cores and analysis every simulation time step S3D& in@situ& data&movement& in@transit&& in@situ&sta3s3cs& 1.69& hybrid&sta3s3cs& 1.7& in@situ&visualiza3on& 0.73& hybrid&visualiza3on& 5.07& hybrid&topology& 2.72& 2.06& 119.81& simula3on& 16.85& 0& 2& 4& 6& 8& 10& 12& 14& 16& seconds&
Simulation case study with S3D: Timing results for 4896 cores and analysis every simulation time step S3D& in@situ& data&movement& in@transit&& in@situ&sta3s3cs& 1.69& 10.0% hybrid&sta3s3cs& 1.7& 10.1% in@situ&visualiza3on& 0.73& 4.3% hybrid&visualiza3on& 5.07&.04% hybrid&topology& 2.72& 2.06& 119.81& 16.1% simula3on& 16.85& 0& 2& 4& 6& 8& 10& 12& 14& 16& seconds&
Simulation case study with S3D: Timing results for 4896 cores and analysis every 10 th simulation time step S3D& in>situ& data&movement& in>transit&& in>situ&sta1s1cs& hybrid&sta1s1cs& in>situ&visualiza1on& hybrid&visualiza1on& 1.0% 1.0%.43%.004% hybrid&topology& 119.81& 1.61% simula1on& 168.5& 0& 20& 40& 60& 80& 100& 120& 140& 160& seconds&
Simulation case study with S3D: Timing results for 4896 cores and analysis every 100 th simulation time step S3D% in<situ% data%movement% in<transit%% in<situ%sta/s/cs% hybrid%sta/s/cs% in<situ%visualiza/on% hybrid%visualiza/on% hybrid%topology%.1%.1%.04%.004%.16% simula/on% 1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% seconds%
In-Situ Feature Extraction and Tracking using Decentralized Online Clustering (DISC 12) DOC workers executed in- situ on simula%on machines DOC Overlay SimulaHon Compute Nodes Processor core runs simulahon Processor core runs DOC worker Benefits of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest) (2) Scientists can do real-time monitoring of the running simulations One compute node
In-situ viz. and monitoring with staging (SC 12) Pixie3D 1024 cores Pixplot 8 cores ParaView Server 4 cores record.bp record.bp pixie3d.bp pixie3d.bp DataSpaces Pixmon 1 core (login node)
Scalable Online Data Indexing and Querying (BDAC 14, CCPE 15) Enable query-driven data analytics to extract insights from the large volume of scientific simulation data Query1 Hilbert SFC Index- 1 Coord- based Query2 Coord- based Z- order SFC Bitmap Index Hash: H1 Hash: H2 Hash: H3 Index- 2 Index- 3 Index- 4 Query3 Value- based Hash: H4 Query4 Tree Index Value- based N 1 N 2 N 3 N 4 Figure. Conceptual view of the programmable online indexing & querying using multiple index Figure. Value-based online index and query framework using FastBit
Multi-tiered Data Staging with Autonomic Data Placement Adaptation Overview (IPDPS 15) Fig. Conceptual framework for realizing multi-tiered staging for coupled dataintensive simulation workflows. A multi-tiered data staging approach that uses both DRAM and SSD to support data-intensive simulation workflows with large data coupling requirements. Efficient application-aware data placement mechanism that leverages user provided data access pattern information
Wide-area Staging (Demo at SC 14) Wide-area shared staging abstraction achieved by interconnecting multiple staging instances Leverages SDN for provisioning; Workflow/Grid technologies Data placement optimized based on demand, availability, locality, etc. App Dataspaces WA Control Node App TCP/IP get()/sub() put()/ pub() ADIOS/ Dataspaces ADIOS/ Dataspaces RDMA, Gridftp Data Tx/ Rx Node
Combus%on: S3D uses chemical model to simulate combushon process in order to understand the ignihon characterishc of bio- fuels for automohve usage. PublicaHon: SC 2012. Fusion: EPSi simulahon workflow provides insight into edge plasma physics in magnehc fusion devices by simulahng movement of millions parhcles. PublicaHon: CCGrid 2010, Cluster 2014 Subsurface modeling: IPARS uses mulh- physics, mulh- model formulahons and coupled mulh- block grids to support oil reservoir simulahon of realishc, high- resoluhon reservoir studies with a million or more grids. PublicaHon: CCGrid 2011 Computa%onal mechanics: FEM coupled with AMR is oben used to solve heat transfer or fluid dynamic problems. AMR only performs refinements on important region of the mesh. (work in progress) Material Science: SNS workflow enables integrahon of data, analysis, and modeling tools for materials science experiments to accelerate scienhfic discoveries which requires beam lines data query capability. (work in progress) Material Science: QMCPack uhlizes quantum Monte Carlo (QMC) studies in heterogeneous catalysis of transihon metal nanoparhcles, phase transihons, properhes of materials under pressure, and strongly correlated materials. (work in progress)
Summary & Conclusions Complex applications running on high-end systems generate extreme amounts of data that must be managed and analyzed to get insights Data costs (performance, latency, energy) are quickly dominating Traditional data management/analytics pipelines are breaking down Hybrid data staging, in-situ workflow execution, etc. can address this challenges Users to efficiently intertwine applications, libraries, middleware for complex analytics Many challenges; Programming, mapping and scheduling, control and data flow, autonomic runtime management. The DataSpaces project explores solutions at various levels
Thank You!! EPSi Edge Physics Simulation Manish Parashar Email: parashar@rutgers.edu WWW: parashar.rutgers.edu WWW: dataspaces.org
DataSpaces Team @ RU 39