Big Data Challenges in Simulation-based Science
|
|
- Baldric Hunt
- 8 years ago
- Views:
Transcription
1 Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI 2 ) Department of Computer Science *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang, ADIOS Team, and other ex-students and collaborators. Manish Parashar, Rutgers University
2 Outline Data Grand Challenges Data challenges of simulation-based science Rethinking the simulations -> insights pipeline The DataSpaces Project Conclusion
3 RDI2 Data-driven Discovery in Science Nearly every field of discovery is transitioning from data poor to data rich Oceanography: OOI Astronomy: LSST Physics: LHC Biology: Sequencing Sociology: The Web Neuroscience: EEG, fmri Economics: POS terminals Credit: Ed Lazowska, Univ. of WA
4 Scientific Discovery through Simulations Scientific simulations running on high-end computing systems generate huge amounts of data! If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EB per year Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to efficiently manage and explore extreme scale data: find the needles in haystack??
5 Scientific Discovery through Simulations Complex workflows integrating coupled models, data management/ processing, analytics Tight / loose coupling, data driven, ensembles Advanced numerical methods (E.g., Adaptive Mesh Refinement) Integrated (online) analytics, uncertainty quantification, Complex, heterogeneous components Large data volumes and data rates Data re-distribution (MxNxP), data transformations Dynamic data exchange patterns Strict performance/overhead constraints
6 Traditional Simulation -> Insight Pipelines Break Down Simulation Machines Simulation Raw Data Storage Servers Analysis/Visualization Cluster Traditional simulation -> insight pipeline Run large-scale simulation workflows on large supercomputers Dump data to parallel disk systems Export data to archives Move data to users sites usually selected subsets Figure. Traditional data analysis pipeline Perform data manipulations and analysis on mid-size clusters Collect experimental / observational data Move to analysis sites Perform comparison of experimental/observational to validate simulation data
7 Challenges Faced by Traditional HPC Data Pipelines Data analysis challenge Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine? I/O and storage challenge Increasing performance gap: disks are outpaced by computing speed Data movement challenge Figure. Traditional data analysis pipeline Lots of data movement between simulation and analysis machines, between coupled multi-physics simulation components -> longer latencies Improving data locality is critical: do work where the data resides! Energy challenge Future extreme systems are designed to have low-power chips however, much greater power consumption will be due to memory and data movement! The costs of data movement are increasing and dominating!
8 The Cost of Data Movement Moving data between node memory and persistent storage is slow! performance gap The energy cost of moving data is a significant concern Energy _ move _ data = bitrate * length 2 cross_section_area_of_wire From K. Yelick, Software and Algorithms for Exascale: Ten Ways to Waste an Exascale Computer
9 Rethinking the Data Management Pipeline Hybrid Staging + In-Situ & In-Transit Execution Issues/Challenges Programming abstractions/systems Mapping and scheduling Control and data flow Autonomic runtime Link with the external workflow Systems Glean, Darshan, FlexPath,..
10 Design space of possible workflow architectures Loca%on of the compute resources Same cores as the simulahon (in situ) Some (dedicated) cores on the same nodes Some dedicated nodes on the same machine Dedicated nodes on an external resource Data access, placement, and persistence Direct access to simulahon data structures Shared memory access via hand- off / copy Shared memory access via non- volahle near node storage (NVRAM) Data transfer to dedicated nodes or external resources Synchroniza%on and scheduling Execute synchronously with simulahon every n th simulahon Hme step Execute asynchronously Analysis Tasks Simulation Visualization DRAM DRAM Node 2 Node 1 Sharing cores with the simulation Using distinct cores on same node CPUs DRAM NVRAM SSD Hard Disk... Network Node N DRAM DRAM Simulation Node Network Communication Staging Node Processing data on remote nodes CPUs DRAM NVRAM Staging option 3 SSD Hard Disk Staging option 1 Staging option 2
11 l l l l DataSpaces: A Scalable Shared Space Abstraction for Hybrid Data Staging [JCC12] Shared-space programming abstraction over hybrid staging l l l l Simple API for coordination, interaction and messaging Provides a global-view programming abstraction Distributed, associative, in-deepmemory object store Online data indexing, flexible querying Exposed as a persistent service Autonomic cross-layer runtime management High-throughput/low-latency asynchronous data transport 0 Throughput(GB/s) Simula on get() App put()/pub() m m m m d d d d d put() Shared Space Model 64K/4K 32K/2K 16K/1K 8192/ / / /64 512/32 d: Data object m: Meta-data object App1 get()/sub() get() App3 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB Aggregate data redistribution throughput DataSpaces scalability on ORNL Titan
12 DataSpaces Abstraction: Indexing + DHT [HPDC10] Dynamically constructed structured overlay using hybrid staging cores Index constructed online using SFC mappings and applications attributes E.g., application domain, data, field values of interest, etc. DHT used to maintain meta-data information E.g., geometric descriptors for the shared data, FastBit indices, etc. Data objects load-balanced separately across staging cores
13 DataSpaces: Enabling Coupled Scientific Workflows at Extreme Scales Multphysics Code Coupling at Extreme Scales [CCGrid10] PGAS Extensions for Code Coupling [CCPE13] compute nodes running simulation and in-situ analytics Pool of computation bucket in the staging area Simulation Simulation Simulation Simulation in-situ analytics DART in-situ analytics in-situ analytics DART in-situ analytics DART DART Data in-transit analytics in-transit analytics DART in-transit analytics DART in-transit analytics DART DART... "bucket-ready" event "data-ready" event Assigned task & data descriptors Resoure scheduler Data Interaction and coordnation DataSpaces data_kernels.o 0x69 0x20 0x61 0x6d 0x20 0x63 0x6f 0x6f 0x6c 0x0 data_kernels.c kernel_min { for i = 1, n for j = 1, m for k = 1, p if (min > A(i, j, k)) min = A(i, j, k) } Applications.o Applications executable Data-centric Mappings for In-Situ Workflows [IPDPS12] Dynamic Code Deployment In-Staging [IPDPS11] gcc -c Link aprun Compute nodes return() Staging nodes aprun rexec() Load Runtime execution system (Rexec) data_kernels.lua for i = 0, ni-1 do for j = 0, nj-1 do for k = 0, nk-1 do val = input:get_val(i,j,k) if min > val then min = val end end end
14 Integrating In-situ and In-transit Analytics [SC 12] S3D: First-principles direct numerical simulation Simulation resolves features on the order of 10 simulation time steps Currently on the order of every 400 th time step can be written to disk Temporal fidelity is compromised when analysis is done as a post-process Recent data sets generated by S3D, developed at the CombusHon Research Facility, Sandia NaHonal Laboratories
15 Integrating In-situ and In-transit Analytics Design Couple concurrent data analysis with simulation StaHsHcs Volume Rendering Topology Identify features of interest *J. C. Bennett et al., Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientific Analysis, SC 12, Salt Lake City, Utah, November,
16 In-situ/In-transit Data Analytics Online data analytics for large-scale simulation workflows Combine both in-situ and in-transit placement to reduce impact on simulation, reduce network data movement, reduce total execution time Adaptive runtime management: Optimize the performance of simulation-analysis workflow through adaptive data and analysis placement, data-centric task mapping primary resources Simulation asynchronous in-situ to in-transit data transfers secondary resources In-transit Task Executors shared data structures In-situ processing data ready Pull task & data descriptors Workflow Manager Figure. Overview of the in-situ/in-transit data analysis framework Figure. System architecture for cross-layer runtime adaptation
17 Simulation case study with S3D: Timing results for 4896 cores and analysis every simulation time step S3D& data&movement& 1.69& hybrid&sta3s3cs& 1.7& 0.73& hybrid&visualiza3on& 5.07& hybrid&topology& 2.72& 2.06& & simula3on& 16.85& 0& 2& 4& 6& 8& 10& 12& 14& 16& seconds&
18 Simulation case study with S3D: Timing results for 4896 cores and analysis every simulation time step S3D& data&movement& 1.69& 10.0% hybrid&sta3s3cs& 1.7& 10.1% 0.73& 4.3% hybrid&visualiza3on& 5.07&.04% hybrid&topology& 2.72& 2.06& & 16.1% simula3on& 16.85& 0& 2& 4& 6& 8& 10& 12& 14& 16& seconds&
19 Simulation case study with S3D: Timing results for 4896 cores and analysis every 10 th simulation time step S3D& in>situ& data&movement& in>transit&& in>situ&sta1s1cs& hybrid&sta1s1cs& in>situ&visualiza1on& hybrid&visualiza1on& 1.0% 1.0%.43%.004% hybrid&topology& & 1.61% simula1on& 168.5& 0& 20& 40& 60& 80& 100& 120& 140& 160& seconds&
20 Simulation case study with S3D: Timing results for 4896 cores and analysis every 100 th simulation time step S3D% in<situ% data%movement% in<transit%% in<situ%sta/s/cs% hybrid%sta/s/cs% in<situ%visualiza/on% hybrid%visualiza/on% hybrid%topology%.1%.1%.04%.004%.16% simula/on% 1685% 0% 200% 400% 600% 800% 1000% 1200% 1400% 1600% seconds%
21 In-Situ Feature Extraction and Tracking using Decentralized Online Clustering (DISC 12) DOC workers executed in- situ on simula%on machines DOC Overlay SimulaHon Compute Nodes Processor core runs simulahon Processor core runs DOC worker Benefits of runtime feature extraction and tracking (1) Scientists can follow the events of interest (or data of interest) (2) Scientists can do real-time monitoring of the running simulations One compute node
22 In-situ viz. and monitoring with staging (SC 12) Pixie3D 1024 cores Pixplot 8 cores ParaView Server 4 cores record.bp record.bp pixie3d.bp pixie3d.bp DataSpaces Pixmon 1 core (login node)
23 Scalable Online Data Indexing and Querying (BDAC 14, CCPE 15) Enable query-driven data analytics to extract insights from the large volume of scientific simulation data Query1 Hilbert SFC Index- 1 Coord- based Query2 Coord- based Z- order SFC Bitmap Index Hash: H1 Hash: H2 Hash: H3 Index- 2 Index- 3 Index- 4 Query3 Value- based Hash: H4 Query4 Tree Index Value- based N 1 N 2 N 3 N 4 Figure. Conceptual view of the programmable online indexing & querying using multiple index Figure. Value-based online index and query framework using FastBit
24 Multi-tiered Data Staging with Autonomic Data Placement Adaptation Overview (IPDPS 15) Fig. Conceptual framework for realizing multi-tiered staging for coupled dataintensive simulation workflows. A multi-tiered data staging approach that uses both DRAM and SSD to support data-intensive simulation workflows with large data coupling requirements. Efficient application-aware data placement mechanism that leverages user provided data access pattern information
25 Wide-area Staging (Demo at SC 14) Wide-area shared staging abstraction achieved by interconnecting multiple staging instances Leverages SDN for provisioning; Workflow/Grid technologies Data placement optimized based on demand, availability, locality, etc. App Dataspaces WA Control Node App TCP/IP get()/sub() put()/ pub() ADIOS/ Dataspaces ADIOS/ Dataspaces RDMA, Gridftp Data Tx/ Rx Node
26 Combus%on: S3D uses chemical model to simulate combushon process in order to understand the ignihon characterishc of bio- fuels for automohve usage. PublicaHon: SC Fusion: EPSi simulahon workflow provides insight into edge plasma physics in magnehc fusion devices by simulahng movement of millions parhcles. PublicaHon: CCGrid 2010, Cluster 2014 Subsurface modeling: IPARS uses mulh- physics, mulh- model formulahons and coupled mulh- block grids to support oil reservoir simulahon of realishc, high- resoluhon reservoir studies with a million or more grids. PublicaHon: CCGrid 2011 Computa%onal mechanics: FEM coupled with AMR is oben used to solve heat transfer or fluid dynamic problems. AMR only performs refinements on important region of the mesh. (work in progress) Material Science: SNS workflow enables integrahon of data, analysis, and modeling tools for materials science experiments to accelerate scienhfic discoveries which requires beam lines data query capability. (work in progress) Material Science: QMCPack uhlizes quantum Monte Carlo (QMC) studies in heterogeneous catalysis of transihon metal nanoparhcles, phase transihons, properhes of materials under pressure, and strongly correlated materials. (work in progress)
27 Summary & Conclusions Complex applications running on high-end systems generate extreme amounts of data that must be managed and analyzed to get insights Data costs (performance, latency, energy) are quickly dominating Traditional data management/analytics pipelines are breaking down Hybrid data staging, in-situ workflow execution, etc. can address this challenges Users to efficiently intertwine applications, libraries, middleware for complex analytics Many challenges; Programming, mapping and scheduling, control and data flow, autonomic runtime management. The DataSpaces project explores solutions at various levels
28 Thank You!! EPSi Edge Physics Simulation Manish Parashar WWW: parashar.rutgers.edu WWW: dataspaces.org
29 DataSpaces RU 39
Addressing Big Data Challenges in Simulation-based Science
Addressing Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and
More informationBig Data Challenges in Simulation-based Science
Big Data Challenges in Simulation-based Science Manish Parashar Rutgers Discovery Informatics Institute (RDI 2 ) NSF Center for Cloud and Autonomic Computing (CAC) http://parashar.rutgers.edu *Hoang Bui,
More informationBig Data Challenges in Simulation-based Science
Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and Computer
More informationAddressing Crosscutting Compute/Data Challenges
Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation Addressing Crosscutting Compute/Data Challenges Manish Parashar parashar@rutgers.edu Manish Parashar, Rutgers
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationVisualization and Data Analysis
Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881
More informationIn-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
More informationTowards Scalable Visualization Plugins for Data Staging Workflows
Towards Scalable Visualization Plugins for Data Staging Workflows David Pugmire Oak Ridge National Laboratory Oak Ridge, Tennessee James Kress University of Oregon Eugene, Oregon Jeremy Meredith, Norbert
More informationDr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationBuilding Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationHPC & Visualization. Visualization and High-Performance Computing
HPC & Visualization Visualization and High-Performance Computing Visualization is a critical step in gaining in-depth insight into research problems, empowering understanding that is not possible with
More informationBSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
More informationSo#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
More informationExploring Software Defined Federated Infrastructures for Science
Exploring Software Defined Federated Infrastructures for Science Manish Parashar NSF Cloud and Autonomic Computing Center (CAC) Rutgers Discovery Informatics Institute (RDI 2 ) Rutgers, The State University
More informationBig Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationData Semantics Aware Cloud for High Performance Analytics
Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement
More informationPACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics
More informationNew Jersey Big Data Alliance
Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor,
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationHank Childs, University of Oregon
Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationIBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas
More informationThe Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationHadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationCloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research
Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationProject Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008
Project Convergence: Integrating Data Grids and Compute Grids Eugene Steinberg, CTO May, 2008 Data-Driven Scalability Challenges in HPC Data is far away Latency of remote connection Latency of data movement
More informationIn situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN
In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN Venkatram Vishwanath 1, Mark Hereld 1, Michael E. Papka 1, Randy Hudson 2, G. Cal Jordan
More informationHarnessing the Potential of Data Scientists and Big Data for Scientific Discovery
Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery Ed Lazowska, University of Washington Saul Perlmu=er, UC Berkeley Yann LeCun, New York University Josh Greenberg, Alfred
More informationHigh Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
More informationParallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationUse Cases for Large Memory Appliance/Burst Buffer
Use Cases for Large Memory Appliance/Burst Buffer Rob Neely Bert Still Ian Karlin Adam Bertsch LLNL-PRES- 648613 This work was performed under the auspices of the U.S. Department of Energy by under contract
More informationBig Data and Science: Myths and Reality
Big Data and Science: Myths and Reality H.V. Jagadish http://www.eecs.umich.edu/~jag Six Myths about Big Data It s all hype It s all about size It s all analysis magic Reuse is easy It s the same as Data
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationHPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development
HPC Trends and Directions in the Earth Sciences Per Nyberg nyberg@cray.com Sr. Director, Business Development Cray Solutions for the Earth Sciences Why Cray? Cray solutions support: Range of modeling capabilities
More informationData Requirements from NERSC Requirements Reviews
Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationEOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
More informationScheduling and Load Balancing in the Parallel ROOT Facility (PROOF)
Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF) Gerardo Ganis CERN E-mail: Gerardo.Ganis@cern.ch CERN Institute of Informatics, University of Warsaw E-mail: Jan.Iwaszkiewicz@cern.ch
More informationEinsatzfelder von IBM PureData Systems und Ihre Vorteile.
Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics
More informationParallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
More informationHPC enabling of OpenFOAM R for CFD applications
HPC enabling of OpenFOAM R for CFD applications Towards the exascale: OpenFOAM perspective Ivan Spisso 25-27 March 2015, Casalecchio di Reno, BOLOGNA. SuperComputing Applications and Innovation Department,
More informationFlash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
More informationIntegration of Supercomputers and Data-Warehouse Appliances
August, 2013 Integration of Supercomputers and Data-Warehouse Appliances Approved for Public Release: SAND 2013-6833 C Ron A. Oldfield, George Davidson, Craig Ulmer, Andy Wilson Sandia National Laboratories
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationData, Computation, and the Fate of the Universe
Data, Computation, and the Fate of the Universe Saul Perlmutter University of California, Berkeley Lawrence Berkeley National Laboratory NERSC Lunchtime Nobel Keynote Lecture June 2014 Atomic Matter 4%
More informationPanasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory
Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationKriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
More informationfor my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste
Which infrastructure Which infrastructure for my computation? Stefano Cozzini Democrito and SISSA/eLAB - Trieste Agenda Introduction:! E-infrastructure and computing infrastructures! What is available
More informationSimplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions
Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions 64% of organizations were investing or planning to invest on Big Data technology
More informationCYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
More informationCollaborative Project in Cloud Computing
Collaborative Project in Cloud Computing Project Title: Virtual Cloud Laboratory (VCL) Services for Education Transfer of Results: from North Carolina State University (NCSU) to IBM Technical Description:
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationSupercomputing on Windows. Microsoft (Thailand) Limited
Supercomputing on Windows Microsoft (Thailand) Limited W hat D efines S upercom puting A lso called High Performance Computing (HPC) Technical Computing Cutting edge problems in science, engineering and
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationDISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study
DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource
More informationNext-Generation Networking for Science
Next-Generation Networking for Science ASCAC Presentation March 23, 2011 Program Managers Richard Carlson Thomas Ndousse Presentation
More informationWITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
More informationBlobSeer: Towards efficient data storage management on large-scale, distributed systems
: Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu
More informationJean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
More informationPentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
More informationData Centric Interactive Visualization of Very Large Data
Data Centric Interactive Visualization of Very Large Data Bruce D Amora, Senior Technical Staff Gordon Fossum, Advisory Engineer IBM T.J. Watson Research/Data Centric Systems #OpenPOWERSummit Data Centric
More informationConquering the Astronomical Data Flood through Machine
Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:
More informationApplication of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems
Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz, M.F. Wheeler ITR Collaborators
More informationMr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo
Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network
More informationAnalyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and
More informationGlobus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
More informationWHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationCloud Computing and the Future of Internet Services. Wei-Ying Ma Principal Researcher, Research Area Manager Microsoft Research Asia
Cloud Computing and the Future of Internet Services Wei-Ying Ma Principal Researcher, Research Area Manager Microsoft Research Asia Computing as Utility Grid Computing Web Services in the Cloud What is
More informationJOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationThe Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data
The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): A Vision for Large-Scale Climate Data Lawrence Livermore National Laboratory? Hank Childs (LBNL) and Charles Doutriaux (LLNL) September
More informationDistributed Systems LEEC (2005/06 2º Sem.)
Distributed Systems LEEC (2005/06 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users
More informationAppendix 1 ExaRD Detailed Technical Descriptions
Appendix 1 ExaRD Detailed Technical Descriptions Contents 1 Application Foundations... 3 1.1 Co- Design... 3 1.2 Applied Mathematics... 5 1.3 Data Analytics and Visualization... 8 2 User Experiences...
More informationHow To Write A Data Mining Program On A Large Data Set
On the Role of Indexing for Big Data in Scientific Domains Arie Shoshani Lawrence Berkeley National Lab BIGDATA and EXTREME-SCALE COMPUTING April 3-May, 23 Outline q Examples of indexing needs in scientific
More information2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
More informationHPCOR 2014. Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1
HPCOR 2014 Data Management Policies Co-Chairs: Bill Allcock, Julia White HPCOR 2014, June 18-19, Oakland, CA 1 Contributors Bill Allcock, ANL Sasha Ames, LLNL Ashley Barker, ORNL Laura Biven, DOE Jeff
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationThe PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationBandwidth Modeling in Large Distributed Systems for Big Data Applications
Bandwidth Modeling in Large Distributed Systems for Big Data Applications Bahman Javadi School of Computing, Engineering and Mathematics University of Western Sydney, Australia Email: b.javadi@uws.edu.au
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationwww.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
More informationNA-ASC-128R-14-VOL.2-PN
L L N L - X X X X - X X X X X Advanced Simulation and Computing FY15 19 Program Notebook for Advanced Technology Development & Mitigation, Computational Systems and Software Environment and Facility Operations
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload
More informationEnabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
212 IEEE 26th International Parallel and Distributed Processing Symposium Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform Fan Zhang, Ciprian Docan, Manish Parashar Center
More informationManaging Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More information