Algorithmic Challenges and Opportunities for Data Analysis and Visualization in the Co-design Process
|
|
- Buck Parrish
- 8 years ago
- Views:
Transcription
1 Algorithmic Challenges and Opportunities for Data Analysis and Visualization in the Co-design Process Hasan Abbasi, Janine Bennett, Peer-Timo Bremer, Varis Carey, Greg Eisenhauer, Attila Gyulassy, Scott Klasky, Robert Moser, Todd Oliver, Manish Parashar, Valerio Pascucci, Karsten Schwan, Hongfeng Yu, and Matthew Wolf
2 SDMA challenges in extreme-scale computing
3 Combustion Workflow RHS of S3D solver at each Stage of an explicit time step Asynchronous movement of data or share data in memory (different levels) In situ, in transit data analysis/viz workflow via hybrid staging
4 We are Building Proxy and Skeletal Apps that Enable Empirical Evaluation of Codesign Design Choices Proxy App for Topology driven feature extraction Proxy App for Topology driven feature tracking Proxy App for Statistical analysis Proxy App for Visualization Proxy App for Uncertainty Quantification Skeletal Apps for Staging and Data Movements
5 Codesign Questions for Data Analysis and Visualization Algorithms How much memory will be available in situ and with what characteristics? Will we have hardware and runtime support for asynchronous computation in situ? Will be performance for small messages reduced? What is the ratio of network bandwidth for in situ vs in transit communication? How well will modern processors support code that is branch heavy and flop free?
6 A Wide Range of Analysis and Visualization Algorithms Are Needed for Combustion Applications In situ multi-variate volume and particle rendering Lagrangian particle querying and analysis Topological segmentation: Contour trees Morse-Smale complex Time tracking Scalar field comparison Distance field (level set) Filtering and averaging (spatial and temporal) Shape analysis Statistical moments (conditional) Statistical dimensionality reduction (joint PDFS) Spectra (scalar, velocity, coherency) Flame-centric control volume analysis
7 We Build Reduced Topological Models for Characterization and Tracking of Combustion Features domain hierarchy Split Birth t Death Continuation t + Δt
8 Merge Trees Represent Feature Extraction at Different Scales with Thresholds for Noise Removal
9 Visualization and Analysis Bottlenecks May Differ from Simulation Bottlenecks Typically I/O bound: limited by rate at which data can be accessed Memory layout may significantly impact efficiency beyond traditional cache effects seen in solvers FLOP vs branch ratios can vary dramatically Algorithms with many branches are highly data-dependent Feature density: how much data is relevant Feature distribution: balance of work load distributed It is difficult to reliably predict generally expected behavior
10 Execution Models and Data Movements Depend on Different Flavors of Hybrid Staging In Situ Co-located: complete resource sharing/contention with simulation Partial Sharing: different processor/core less resource contention Out of Band: on node with minimal resource usage (e.g. use of ooc techniques, low priority) In Transit Local: sending data to different nodes of the same machine Remote: sending data to a separate machine
11 The Codesign Process Provides a Unique Opportunity to Studying Algorithmic Design Space
12 Multi-scale Design Patterns and Execution Models Lead to Local, and global algorithm design parameters Global design parameters include: Number of execution units Data aggregation patterns N-1 communication patterns Use of pair-wise data exchange schemes Local design parameters are algorithm-specific: Sort First, Filter Last Filter First, Sort Last Filter First, Traverse Last
13 We can project behaviors for different design parameters and execution models onto prospective hardware configurations SST and spreadsheet models provide a projection of communication and wall time Prospective configurations: PIM (Processing in memory, cache-less architecture) exanode1 (commodity NIC and memory) exanode2 (custom on-board NIC + faster memory) ExaCT tools Performance analysis spreadsheet
14 We are exploring 3 use cases that cover a range of characteristic behaviors Statistics Visualization Topology Local compute behavior Complexity is data dependent? Amount of data transferred for aggregation/gather Two-Phases: All FLOPs Two-phases: Some FLOPS, Three-phases: 2 FLOP-free, 1 FLOP-heavy No Yes Yes Constant - small Can be datadependent Data-dependent Scatter required? Sometimes (small data) No Yes (small data)
15 Topology Algorithm Computation of local features Process vertices in sorted order, detecting joining of iso-surface components Uses Union-Find data structure Communication to resolve features spanning multiple blocks Local merge trees are communicated in N-to-1 merges Corrections to local trees are re-broadcast to local compute nodes Feature-based statistics computation The segmentation stored with the corrected local merge trees are used to compute per-feature statistics Averages, size, shape
16 Basic Topology Measurements Obtained with Byfl Basic analysis for on node computation (data size 560x560x560) num cores num points points per core Total Loads (MB) Total Stores (MB) Total FLOPs Loads/core (MB) ,616, ,616 4,344, , ,616,000 25,088 4,098, ,616,000 6,272 3,790,
17 Topology Design Space Exploration Computation of local features alternatives Feature Advantage Disadvantage Sort-first Efficient (possible GPU) O(n) Memory for sorted indices Progressive-sort Smaller memory footprint extra pass over data Union-Find Efficient O(n) Union-Find data structure Streaming computation Re-use of memory for tree Significantly slower Communication alternatives Interleave compute/comm Streaming compute Asynchronous comm N-to-1 gather then 1-to-N scatter Simple com. Model Higher latency N-to-1 gather interleaved with 1- to-n scatter N-to-1 gather interleaved with 1- to-leaves scatter Less idle time on compute node Less idle time on compute node Feature-based statistics alternatives Re-do some work Complex communicators Wait for corrections then compute Simple Higher latency Compute-and-correct Lower latency Re-do work, more complex
18 Example execution model 1: in situ data transfers on node or on network Topology computed directly integrated with the solver K-rounds of merges on in-situ processes Factors analyzed: Number of nodes Number of cores per node Number of merge operations per stage Initial data access through shared pointer Free parameters: Communicate on network first Communicate on network last Size of messages
19 Msg. Size Msg. Count Communication Loads for Different Data Transfer Patterns Data Size: 2025x1600x400, Kay=0.31, Binary Merges 2-cores per node 4-cores per node 8-cores per node Merge stages
20 Msg. Size Msg. Count Communication Loads for Different Data Transfer Patterns Data Size: 2025x1600x400, Kay=0.31, 8-way Merges 2-cores per node 4-cores per node 8-cores per node Merge stages
21 Example execution model 2: part in situ and part plus in transit Initial local compute directly integrated with the solver K-rounds of merges on in-situ processes (N-K) rounds of merging in staging area Predicted cost factors: Initial data access through shared pointer K-round of merges in blocking mode as part of the solver code including shared memory (on-node) and MPI messages (off-node) communications Data transfer to staging area Asynchronous computation in staging area Free parameters: merging strategy and staging area break Things to watch out for: Initial data transfer might pollute the cache In-situ merge becomes sparse quickly
22 Total Communication Communication Loads for Different Data Transfer Patterns Data Size: 2025x1600x400, Kay= cores per node 4-cores per node 8-cores per node binary merge 8-way merge Merge stages
23 Statistics Algorithm 1 st -4 th order moments, variance, skewness, kurtosis, minimum and maximum values are values commonly computed by physics codes Pair-wise update formulas for 1 st -4 th order moments allow for a single-pass distributed implementation Given moment(a) and moment(b), compute moment(a U B) The global model can optionally be scattered to all processes to allow for assessment of observations (e.g. to determine outliers)
24 Statistics Design Space Exploration The algorithm is mostly FLOPs, is not data-dependent, and requires small amounts of data to be communicated Small-scale, algorithm-specific design parameters None: local computations are straightforward implementation of update formulas Large-scale design parameters Update formulas provide complete flexibility in communication patterns Support arbitrary depth/width of the compute tree Execution model Initial local compute level is good candidate for insitu (all data must be transferred otherwise) Later local compute levels could be placed anywhere (require very small data sizes transferred: moments, minima, and maxima only)
25 operations per core Measurements obtained with Byfl confirm data-parallel, scalable nature of statistics algorithms 1.00E+10 HCCI-ALU HCCI-FLOP LEJ-ALU LEJ-FLOP 1.00E E E E E E E E E E E E E+08 points per core Data set Dim x Dim y Dim z num cores points/core Loads/core (MB) Stores/core (B) FLOPs/core ALU ops/core mem ops/core LEJ 2,025 1, ,250, ,000,036 1,032,750,120 40,500,024 LEJ 2,025 1, ,025, ,700, ,275,113 4,050,018 LEJ 2,025 1, , , ,670,036 10,327, ,017 LEJ 2,025 1, ,000 20, ,036 1,032,862 40,517 LEJ 2,025 1, ,000 2, , ,387 4,050 HCCI ,136, ,808, ,936,131 6,272,041 HCCI , ,780,836 15,993, ,219 HCCI ,600 31, ,116 1,599,472 62,737 HCCI ,000 3, , ,048 6,289 HCCI ,000 1, ,940 80,080 3,153 Data movement options: all gather or gather of 3KB per processor
26 Visualization Algorithm Volume rendering of local data to generate partial images Cast a ray from the eye through each pixel of an image For each ray, sample local volume, map data into color values via transfer function, and accumulate color values. Parallel image compositing to combine the partial images into a final global image Build communication schedule according to distribution of pixel data Exchange pixel data via communication Blend exchanged pixel data
27 Visualization Design Space Exploration The algorithm is marginally FLOPs, can be data-dependent, and requires a potentially large amount of messages exchanged. Small-scale, algorithm-specific design parameters Adaptive workload of local rendering Features specified by users in transfer function space, features identified by analysis algorithms, data resolution for different exploration purposes Optimization and acceleration on CPUs and/or GPUs Large-scale design parameters Optimize communication schedule according to pixel data distribution Minimize link contention, pixel data exchanged, and blending cost Exploration using MPI and/or OpenMP Execution model Local rendering of simulation data could be performed in situ to minimize data movement Local rendering of feature data could be place anywhere
28 Visualization Analysis Results Measurements obtained with Byfl confirm data-parallel, nearly scalable nature of local rendering. The number of operations varies marginally across cores depending on data features and rendering parameters. Measurements with a middle image quality Data set Dim x Dim y Dim z num cores points/core Loads/core (MB) Stores/core (B) FLOPs/core ALU ops/core mem ops/core LEJ 2,025 1, ,250,000 1, ±1.67% ±1.80% 55,276,662±1.34% 789,879,803 ±1.62% 466,466,350 ±1.84% LEJ 2,025 1, ,025, ±1.76% ±2.05% 7,712,093 ±1.37% 102,163,763 ±1.77% 58,18,647 ±2.04% LEJ 2,025 1, , , ±1.87% ±2.21% 383,819 ±1.26% 4,921,160 ±1.82% 2,691,746 ±2.21% LEJ 2,025 1, ,000 20, ±4.71% 1.84 ±5.92% 106,312 ±4.86% 1,655,434 ± ,687 ±5.29% LEJ 2,025 1, ,000 2, ±2.91% 0.23 ±3.11% 16,851 ±1.86% 272,890 ±1.43% 132,357 ±3.33% HCCI ,744, ±3.28% ±3.54% 5,508,745 ±3.06% 87,148,662 ±3.53% 53,197,224 ±3.47% HCCI , ±1.29% ±1.52% 436,888 ±9.34% 5,569,581 ±1.26% 3,068,742 ±1.52% HCCI ,400 27, ±0.58% 5.83 ±0.73% 322,109 ±0.64% 4,993,671 ±0.87% 3,036,488 ±0.64% HCCI ,000 2, ±1.75% 2.05 ±1.96% 110,723 ±1.71% 1,749,591 ±2.06% 1,070,992 ±1.87% HCCI , ±1.36% 0.38 ±1.86% 20,762 ±1.56% 325,887 ±1.98% 196,553 ±2.03% Middle image quality For un-optimized image compositing, the size of messages exchanged across cores depends on image resolutions. For the same image resolution, the size of messages is nearly same for the different numbers of cores. The messages can be reduced via optimization. High image quality
29 Successful execution hinges on tight integration with all the co-design components Separate Proxy Apps for Data Analysis and Visualization Integration to understand combined behavior Further reduction to build skeletal apps Coordination with Data Management Efficient data transfer strategies Where are the biggest reserves in performance, energy, wall time? Coordination with DSL Improve local compute patterns Fast index computation Elimination of unnecessary branches (boundary cases) UQ Analysis How much persistent memory is required? Modeling capabilities Compilers (Byfl, ROSE) Simulation(SST, spreadsheet model) Solvers Study tradeoffs and cache pollution effects (possible sharing of data structures)
30 Uncertainty Quantification within the SDMA workflow
31 UQ and Data Management The Problem: Evaluate sensitivities of quantities of interest (QoIs) with respect to chemistry model parameters, or modeled fields (e.g. reaction rate fields) The to solving this problem requires solving P+1 forward simulations, where P is the number of sensitivity evaluations P can be very large (>> 1000) makes classical approach infeasible solve one auxiliary problem, the adjoint problem, which is linearized about the primal solution. Need the primal (forward) solution to solve the adjoint problem.
32 Solving the Adjoint Problem The challenge: Solving the Adjoint Problem requires the Primal State The Adjoint Problem must be solved backwards in time. Must store primal state at all time substep!
33 More Sophisticated Adjoint Solution To reduce Storage requirements: Store a limited number of Primal states (check points) Use check points to recompute Primal state when needed Example with two checkpoints:
34 Storing full Primal solution state is prohibitive (e.g. 1PB/state) Only interested in sensitivities in a limited region in space & time (RoI, e.g. Extinction event) RoIs are not known a priori Solve Primal problem & identify RoIs using in situ analysis Resolve Primal problem only checkpointing state from the RoIs Only solve adjoint problem in RoIs Example with one Region of Interest
35 Example from Combustion Use Case Naïve adjoint solution (store full state in space & time) Storage: 5ZB Compute: 2 X Primal problem One level checkpointing Storage: 4EB Compute: 3 X Primal problem Six level checkpointing Storage: 50PB Compute: 8 X Primal problem One level checkpointing on N RoIs Storage: (1+1.3N)PB Compute: ( N) X Primal Problem Important trade-off between storage and recomputing A proxy app capturing adjoint solution data flow is being developed
36 Data Management
37 Goals for SDMA in EXACT 1. Explore data staging techniques to deal with Exascale data 2. Design questions for data staging Where should data for A&V be stored Where should the A&V operations be executed How should SDMA integrate with the solver What architectural features be leveraged for SDMA 3. Evaluate design space
38 The Meta-Skeleton
39 Design Space Solver Proxy? Descriptive Stats Visualization Topological Analysis 1. Where do we move the data to 2. How do we extract data from the solver 3. What hardware features can be exploited 1. What processing resources are allocated 2. How do we schedule the execution of these tasks
40 Storage scalability and the power wall Disk sizes of TB Single disk bandwidth of MB/s Power consumption of ~45 W/disk System memory 32PB 3 A full checkpoint every hour => TB/s I/O bandwidth => 277,633 disks => 13 MW of power 1,2 Without even considering RAID! 1. Power use of disk subsystems in supercomputers, Curry, M.L. and Ward, H.L. and Grider, G. and Gemmill, J. and Harris, J. and Martinez, D., Proceedings of the sixth workshop on Parallel Data Storage, Nov G. Grider. Exa-scale FSIO. HEC-FSIO workshop presentation. August R. Stevens and A. White. A DOE laboratory plan for providing exascale applications and technologies for critical DOE mission needs, SciDac Workshop, July 2010
41 Synchronous I/O is not the solution S3D simulation Synchronous I/O O(400 PB)/run O(1M) cores 1 PB/dump every 30 minutes Storage space requirements 35 disks for each dump (No RAID) 1.5 KW/live dump Performance requirements 5% overhead, ~31k disks, >1.4 MW 10% overhead, ~15k disks, >0.65 MW 50% overhead, ~3k disks, >0.14 MW Synchronous I/O Analysis MS-Complex Visualization Volume, Surface, Particle rendering Downstream Isomap
42 What EXACTly is Staging? Extra stage(s) in the data pipeline Use available memory resources to serve as a buffer Use available compute resources to serve as an execution target Traditional staging used discrete nodes Used for high performance I/O managing storage variability for application coupling for in transit workflows
43 Keep data in a Shared Data Space Maintain data in a shared space in staging Shared space can share the same memory as the simulation Multiple analysis and visualization services can access data
44 Managing Data Movement Data movement is expected to be a bottleneck for Exascale Use flexible resource allocation to optimize data movement
45 Hybrid Staging Asynchronous Data Transfer Asynchronous Data Transfer Hybrid Staging S3D-Box ADIOS In situ Analysis and Visualization ADIOS ADIOS In transit Analysis Visualization ADIOS S3D-Box ADIOS In situ Analysis and Visualization ADIOS S3D-Box ADIOS In situ Analysis and Visualization ADIOS ADIOS In transit Analysis Visualization ADIOS S3D-Box ADIOS In situ Analysis and Visualization ADIOS Parallel Data Staging coupling/analytics/viz Use compute and deep-memory hierarchies to optimize overall workflow for power vs. performance tradeoffs S3D-Box ADIOS In situ Analysis and Visualization ADIOS ADIOS In transit Analysis Visualization Utilize hybrid staging for analytics and visualization Abstract complex/deep memory hierarchy access ADIOS Compute cores Statistics Topology In transit ADIOS Visualization S3D-Box In situ Analysis and Visualization ADIOS ADIOS In transit Analysis Visualization ADIOS
46 Hybrid Staging Hybrid staging is a combination of the available solutions Classify data processing actions into In line/in situ Asynchronous/in situ Asynchronous/in staging Asynchronous/on disk What about tasks that span multiple classes? Partition the algorithms Place tasks to take advantage of data management
47 Resources in Hybrid Staging Placement of analysis and visualization tasks in a complex system Leverage fast/slow DRAM and Leverage local NVRAM/SSD Impact of network data movement compared to memory movement Network topology impact on performance and power
48 Tradeoffs for Hybrid Staging Going to disk is slow even for small application sizes Inline approach adds more overhead to application runtime In transit approach gives better overall performance Additional cost of data movement Normalized Total CPU Seconds Offline: Process data after writing to disk In line: Process data in place synchronously with the application Staging: Move data to staging resources for processing
49 Impact of Task Mapping Data-centric task mapping Significant saving in amount of data transferred data by co-locating data producers and consumers Time Time CAP1 data Concurrent coupling (CAP1: 512, CAP2: 64 cores) data SAP2 SAP1 CAP2 data SAP3 Sequential coupling (SAP1: 512, SAP2: 128, SAP3: 384 cores)
50 Complex Memory Hierarchy Workflow integrates knowledge of complex memory hierarchies Placement decisions are important factors for evaluation Local NVRAM must be leveraged Fast-small memory vs slow-big memory
51
52 Impact of NVRAM Study how deep memory hierarchy can be used for end-toend I/O analytic pipeline How NVRAM can be used as a staging area How much of each level of the memory hierarchy to use for the staging area? Where to move data (RAM, NVRAM, SSD, disk, network) When (and how frequently) to move the data over the hierarchy
53 Tradeoff between Frequency and Costs Frequency of analysis impacts energy and performance of analysis NVRAM/disk gap Not a linear function Sweet Spot case dependent Experiments in collaboration with Steve Poole, ORNL
54 Asynchronous Workflow Impact Frequency of analysis impacts energy and performance of analysis Not a linear function Bitter spot case dependent
55 Execution time (ms) NVRAM for C/R Optimizing by hiding data movement to NVM Run time (sec) Run time - Optimized (sec) NVM B/W per core (MB) NVM per core data copy bandwidth was assumed to be 450 MB/sec (compared to 2 GB device B/W). Experiment was conducted in a 12 core GHz Intel Xeon node, with 48 GB DDR3 memory. To emulate NVM, 24 GB of memory was used for NVM Experiments in collaboration with HP
56 Deep Memory Tradeoffs Driving synthetic application benchmark: Generates data Allocates two matrices in DRAM memory and fills them up with random data Operates with generated data runs a kernel Multiplication (MUL), addition (ADD) or read access (NOP) generated matrices Manage generated data: Keeps the data in RAM memory Staging area (Local Fusion-IO, remote Fusion-IO, HDD, SSD, etc ) Asynchronously runs data analysis (reads data from staging area) Quality of solutions Frequency of data analysis Resources for analysis (accuracy)
57 Co-Design Decisions for Complex Memory Impact of using slow memory for SDMA processing Power vs Performance tradeoffs Size of Memory vs Speed of Memory Using a combination of fast memory and slow memory Fast memory size and speed compared to slow memory size Managing performance at the application level Tradeoff frequency of analysis with memory usage Use combination of asynchronous and synchronous computation Use knowledge of the workflow to study tradeoffs Write vs Read ratios
58 Dealing with UQ Our next big target for data management Use analysis to select only the ROI Use the feature detection algorithms and their optimization Ideal candidate for deep memory Data is not used for a long time after output Data access is regular and predictable Move data to fat/slow memory
59 The Proxy App Original Application Link Tracing Records Pattern Analyzer Tracing Tools Skel-2.0 Child of Skel that creates proxies for synchronous I/O Skel-xml file for I/O Insert User s Changes SKEL xml file for Communication Communication Phase SKEL- Code Generator Desired Benchmark Magically generates a workflow through tracing patterns Underdevelopment
60 X-Stack Interactions Increase engagement with X-Stack projects Dynamic Task Scheduling Resource allocation Memory and Task dependencies Bring Fast Forwards into the conversation Storage Complex Memory hierarchies DRAM, NVRAM, ScratchPads, SSDs New processing elements GPUs, MICs
61 Outstanding Questions GPU/Accelerators for SDMA tasks? Staging nodes can be customized with additional resources Analysis tasks can be split similarly to in situ/in transit Integration with the Solver Data extraction impact on Solver performance Inline processing can impact cache Data copies pollute cache Exploring asynchronicity in algorithms
62
63 What about UQ? Solver Compute Adjoint Store Solution Feed the adjoint computation in reverse order Need to store the ENTIRE data set
64 UQ v2.0 Solver Store Solution Compute Adjoint Store smaller subset of solution u(t), from t i to t i-1 Recompute u(t)
65 UQ v3.0 Solver Identify the interesting events Filter Domains Store even smaller subset of solution Store Solution u(t), from t i to t i-1 Recompute u(t) Compute Adjoint
66
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationWITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE
WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What
More informationParallel Large-Scale Visualization
Parallel Large-Scale Visualization Aaron Birkland Cornell Center for Advanced Computing Data Analysis on Ranger January 2012 Parallel Visualization Why? Performance Processing may be too slow on one CPU
More informationFile System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationDIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
More informationHank Childs, University of Oregon
Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationVisualization and Data Analysis
Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationJean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationCloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationPerformance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
More informationPOSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
More informationDSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationNetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing
NetApp FAS Hybrid Array Flash Efficiency Silverton Consulting, Inc. StorInt Briefing PAGE 2 OF 7 Introduction Hybrid storage arrays (storage systems with both disk and flash capacity) have become commonplace
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationPerformance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
More informationThe Design and Implement of Ultra-scale Data Parallel. In-situ Visualization System
The Design and Implement of Ultra-scale Data Parallel In-situ Visualization System Liu Ning liuning01@ict.ac.cn Gao Guoxian gaoguoxian@ict.ac.cn Zhang Yingping zhangyingping@ict.ac.cn Zhu Dengming mdzhu@ict.ac.cn
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationAccelerating Server Storage Performance on Lenovo ThinkServer
Accelerating Server Storage Performance on Lenovo ThinkServer Lenovo Enterprise Product Group April 214 Copyright Lenovo 214 LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationEverything you need to know about flash storage performance
Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices
More informationDisks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer
Disks and RAID Profs. Bracy and Van Renesse based on slides by Prof. Sirer 50 Years Old! 13th September 1956 The IBM RAMAC 350 Stored less than 5 MByte Reading from a Disk Must specify: cylinder # (distance
More informationThe IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationBoost Database Performance with the Cisco UCS Storage Accelerator
Boost Database Performance with the Cisco UCS Storage Accelerator Performance Brief February 213 Highlights Industry-leading Performance and Scalability Offloading full or partial database structures to
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationNV-DIMM: Fastest Tier in Your Storage Strategy
NV-DIMM: Fastest Tier in Your Storage Strategy Introducing ArxCis-NV, a Non-Volatile DIMM Author: Adrian Proctor, Viking Technology [email: adrian.proctor@vikingtechnology.com] This paper reviews how Non-Volatile
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationNew Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC
New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationIn-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
More informationPower-Aware High-Performance Scientific Computing
Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationThe IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)
The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920
More informationQLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationJun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design
More informationTrends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
More informationInteractive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationCapacity Planning for Microsoft SharePoint Technologies
Capacity Planning for Microsoft SharePoint Technologies Capacity Planning The process of evaluating a technology against the needs of an organization, and making an educated decision about the configuration
More informationScaling from Datacenter to Client
Scaling from Datacenter to Client KeunSoo Jo Sr. Manager Memory Product Planning Samsung Semiconductor Audio-Visual Sponsor Outline SSD Market Overview & Trends - Enterprise What brought us to NVMe Technology
More informationLSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment
LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment Evaluation report prepared under contract with LSI Corporation Introduction Interest in solid-state storage (SSS) is high, and
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationMaximizing SQL Server Virtualization Performance
Maximizing SQL Server Virtualization Performance Michael Otey Senior Technical Director Windows IT Pro SQL Server Pro 1 What this presentation covers Host configuration guidelines CPU, RAM, networking
More informationLow-Power Amdahl-Balanced Blades for Data-Intensive Computing
Thanks to NVIDIA, Microsoft External Research, NSF, Moore Foundation, OCZ Technology Low-Power Amdahl-Balanced Blades for Data-Intensive Computing Alex Szalay, Andreas Terzis, Alainna White, Howie Huang,
More informationOracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
More informationUsing Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
More informationA Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011
A Close Look at PCI Express SSDs Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011 Macro Datacenter Trends Key driver: Information Processing Data Footprint (PB) CAGR: 100%
More informationBeyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp
Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationFull and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationHP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels Jiannan Ouyang, Brian Kocoloski, John Lange The Prognostic Lab @ University of Pittsburgh Kevin Pedretti Sandia National Laboratories HPDC 2015
More informationThe Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationMS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationICRI-CI Retreat Architecture track
ICRI-CI Retreat Architecture track Uri Weiser June 5 th 2015 - Funnel: Memory Traffic Reduction for Big Data & Machine Learning (Uri) - Accelerators for Big Data & Machine Learning (Ran) - Machine Learning
More informationOptimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
More informationDemystifying Deduplication for Backup with the Dell DR4000
Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett
More informationVALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
More informationPARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationVMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014
VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationSo#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
More informationParallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationFlash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
More informationioscale: The Holy Grail for Hyperscale
ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationIntel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationIntel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance
Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid
More informationHPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationEMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationThe Data Placement Challenge
The Data Placement Challenge Entire Dataset Applications Active Data Lowest $/IOP Highest throughput Lowest latency 10-20% Right Place Right Cost Right Time 100% 2 2 What s Driving the AST Discussion?
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More informationElements of Scalable Data Analysis and Visualization
Elements of Scalable Data Analysis and Visualization www.ultravis.org DOE CGF Petascale Computing Session Tom Peterka tpeterka@mcs.anl.gov Mathematics and Computer Science Division 0. Preface - Science
More information