Large Scale Parallel Reservoir Simulations on a Linux PC-Cluster 1
|
|
|
- Poppy Barrett
- 10 years ago
- Views:
Transcription
1 Large Scale Parallel Reservoir Simulations on a Linux PC-Cluster 1 Walid A. Habiballah and M. Ehtesham Hayder Petroleum Engineering Application Services Department Saudi Aramco, Dhahran 31311, Saudi Arabia {[email protected] and [email protected]} Abstract: Numerical reservoir simulation is an important tool used by engineers to develop producing strategies for oil and gas fields and to understand fluid flow behavior in subsurface reservoirs. Demand for mega scale reservoir simulations is increasing, particularly in large fields common in Saudi Arabia. These mega cell simulations are expensive on traditional supercomputers. PC cluster technology is rapidly evolving and promises to be a low cost alternative to traditional supercomputers. In this work, we investigated the use of state of the art PC clusters in simulation of massive reservoir models. Our investigation indicates that large scale reservoir simulations can be performed efficiently and cost effectively on latest PC clusters. In this paper, we discuss some of our findings and issues related to parallel reservoir simulations on PC clusters. We also present performance comparison among Xeon Linux PC clusters and an IBM SP Nighthawk supercomputer. Keywords: Parallel reservoir simulations, PC cluster, Linux cluster, parallel computations. 1. Introduction POWERS 1,2 is Saudi Aramco s in-house developed parallel reservoir simulator. This simulator is capable of simulating black-oil, compositional, dual porosity dual permeability and locally refined grid models. Currently POWERS is used to simulate reservoirs models with sizes ranging from few hundred thousands to ten million cells. Over the next few years, number and size of reservoir models are expected to grow 1 Saudi Aramco (2003). All rights reserved. No portion of this article may be reproduced by any process or technique without the express written consent of Saudi Aramco. Permission has been granted to QuarterPower Media to publish this work.
2 2 significantly. The cost of such simulations on traditional supercomputers will be exorbitantly high. The motivation behind this investigation was to identify and evaluate low cost computing platforms, which will meet increasing computational need of reservoir simulations at Saudi Aramco. PC-Clusters 3 are enjoying growing interest as a high -performance computing platform and the number of installed clusters have been growing (see the Top500 supercomputer list 4 ). The PC clusters approach has been proven to be a powerful platform for many large scale scientific computations (see references 5-7 ). Indeed, Saudi Aramco has applied and benefited from use of this technology -platform for its geophysical seismic data processing 8. A PC -Cluster is a collection of low cost commodity computers (PCs) interconnected by a switch. Although the notion of building a cluster with commodity hardware is simple, performance or throughput of such a cluster can be quite impressive. This w as made possible because the computational engine (the processor) has improved significantly, most notably in last two years. At present, performance of processors in a typical PC cluster (2.4 GHz Pentium IV) is superior to that of a traditional supercomputer. Fig. 1 shows the SPECfp2000 numbers 9 for three different processors. The Power 3 processor is used in IBM -NightHawk system, the MipsPro processor is used in SGI-Origin systems, while the Pentium IV processor is typical in desktop PCs. While the SPECfp2000 number is only one indicator, the ultimate measure of performance is the performance of the target application. PC-Clusters have the advantage of price performance ratio over traditional supercomputers. Intel chips are manufactured in the millions compared to processors designed for special use. This has made considerable contribution to price reduction of Intel chips and hence PC-Cluster evolution. At present, possible computing platforms for parallel re servoir simulation are typical high-performance computing system (such as IBM-PWR3/4 or SGI-Origin) and PC -Clusters. The cluster platform has the advantage of having high speed processors. For a tightly coupled system such as reservoir simulation, performance of interconnection network is very important. High -speed switches such as Quadrics and Myricom-Myrinet are currently available for building powerful clusters. IBM SP Nighthawk (NH2) nodes are currently used as the platform for production reservoir simulations at Saudi Aramco. POWERS uses both shared and distributed memory parallel programming models and is fine-tuned to run optimally on IBM Nighthawk nodes. These nodes use 64-bit addressing space and support both shared (OpenMP directive) and distributed memory (MPI calls) programming models. To investigate PC-Clusters, we acquired a 32 node test cluster based on Intel Xeon 2.0 GHz processors connected by Quadrics switch. Later we acquired a system with Mericom-Myrinet switch. Like IBM Nighthawks, our Xeon clusters support both MPI calls and OpenMP directives for parallelization (MPI among all processors, and OpenMP between two processors within a node). Since each node on our clusters has
3 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 3 only two processors, the scope of parallelization using OpenMP directives is limited. Our preliminary experience indicates that the overhead associated with inter-node communication on our current hardware is less than the overhead associated with intra node (i.e., between two processors within a node) parallel computation. This was true for both OpenMP and MPI implementations for parallel computations on two processors within a node SPECfp2000 Numbers Power MHz IBM-NH-2 SGI MIPS-Pro 500MHz-R14K Pentium IV 2.4GHz Fig. 1. Performance of different processors We extended parallelization and revised communication algorithms in POWERS to improve performance on our clusters. We tested our new code on various simulation models. The size of our test models ranged from a few hundred thousand cells to about 10 million cells, numbers of wells were up to 4,000. In this paper we examine issues which are important to obtain good performance on large number processors in a PC -Cluster. In section 2 we discuss POWERS formulation. Next we describe our hardware platforms in section 3. Our experiences with porting POWERS onto our PC clusters are given in section 4. In section 5, we discuss our experiences with POWERS computations on test platforms. Finally, in section 6, we give our conclusions. 2. Formulation of the Simulator POWERS uses a finite volume formulation to compute flows in a reservoir and solves mass conservation equations for multi-component multiphase fluid flows in
4 4 porous media. Governing equations are derived from mass conservation equations and are written for each components (oil, water and gas) as (φρ a t S a ) + U a where U a is the Darcy s velocity for each component and is computed as U a KaSaK = ρa( Pa ρag H ) µ a Volume balance equation is used as a constraint given by = q (1) (2) S a = 1 (3) If fluid properties are functions of only surrounding pressure and bubble-point pressure, a simplified approach known as black oil formulation is used. In such models, there may be up to three fluid components (water, oil and gas). Some reservoirs are above bubble point pressure and modeled with only two components, namely water and oil. It is important to consider more detailed fluid compositions to determine properties in volatile oil reservoirs and also where enriched gas is injected for enhanced recovery. These reservoirs are modeled using compositional formulation where thermodynamic equilibrium calculations are performed using an equation of state. Reservoir fluid in these models is composed of water and a number of hydrocarbon components. In black oil simulations, solutions are obtained in terms of pressure and saturations by either Implicit Pressure Explicit Saturation (IMPES) or a fully implicit method. Heterogeneous nature of rocks in a fractured reservoir may be modeled using Dual Porosity and Dual Permeability (DPDP) formulation. In DPDP formulation, reservoir rock is represented by a collection of highly permeable fractures and low permeable matrices (or homogeneous solid rocks). Flows in fractures and matrices are coupled. Because of their very small sizes, typical fractures cannot be resolved on a computational grid. Therefore, each grid cell is modeled to contain a lumped fracture and a matrix region. Rock descriptions in a simulation model are derived from an underlying geological model of the reservoir. Typically, many grid cells of a geological model are consolidated int o a single simulation grid cell. This process is known as up scaling. Proprietary software packages are used to upscale geological model data into simulation input data for POWERS. In house software packages are used to generate well production, injection and completion histories directly from Saudi Aramco s corporate database. These data are used for history matching simulation. In a prediction study, computations are guided by production strategies given in simulator input data. Such computations take into account of various parameters such as the production target specified by the user, limitations of surface facilities, well rate/pressure restrictions, etc.
5 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 5 Governing equations describing flows are discretized to get a set of equations for primary variables. These equations are linearized and solved using Newton-Raphson technique. Linearized system of equations is solved within each Newton step using a parallel linear solver known as the reduced system solver. Solution technique used in POWERS is based on generalized conjugate residual (GCR) method and orthomin with truncated Neuman series preconditioner. POWERS uses three dimensional structured grid cells to discretize the reservoir. In general, flows inside a reservoir can be reasonably represented on a non-uniform structured grid. However, for accurate simulation, one may need high resolution near a well, while such resolution may not be necessary away from a well. In those cases, computational grid near a well can be refined to generate a composite grid consisting of a base grid and locally refined grid (LGR) patches. Current version of POWERS supports two level grid systems, the base grid and one level fine grid patches. An iterative multi grid scheme is used to solve governing equations. At each step of iteration, the coarse grid solutions are obtained and they are used to update residuals on the fine grid. Fine grid equations are then solved and solutions are used to update residuals on the base grid. This iterative process is repeated until a converged solution is obtained. 3. Hardware platforms Our present study was done on an IBM SP Nighthawk (NH2) and two PC-Cluster systems. IBM NH2 uses 375 MHz Power3 processors and currently is the system used for production reservoir simulations at Saudi Aramco. Our first PC -Cluster uses the Quadrics switch, and the second uses the Mericom -Myrinet switch. The Quadrics cluster was used for initial porting and testing of POWERS. We will refer the Quadrics cluster as Cluster_Q and the Mericom-Myrinet cluster as Cluster_M in this paper. We considered five important items in configuring our PC clusters --- processor, network connection, memory, I/O, and software tools available for development and support of our simulator and simulat ion activities. Selection of the cluster processor was fairly straight forward, Pentium IV had advantage over the Pentium III (higher clock rates, better floating point performance, higher bus speed, and the PIV had been just released while PIII had been at the end of its development life). The Xeon gave us the dual-processor capability over the PIV and would allow us to use multi-threading in our code. Rack mounted Xeon clusters were available from num erous vendors and allowed us to have an open bid instead of sole source acquisition. For the inter-node network connections, we considered four options --- Fast Ethernet, Giga-bit, Mericom-Myrinet, and Quadrics-Elan switches. Specifications of these switches are shown in Table 1.
6 6 Switch Bandwidth (Mbytes/sec) Latency (microsecond) Fast Ethernet Gigabit Myrinet Quadrics Table 1. Specifications of various switches High performance parallel reservoir simulation necessities the use of high-speed communication hardware. The communication between compute nodes is required to compute derivatives at sub domain (grid block) boundaries. Fast Ethernet and Gigabit are low-cost commodity switches with limited bandwidth. They are used for I/O and cluster management but are unsuitable for data communication among grid blocks (at least 200 Mbytes/sec bandwidth is desirable). Mericom-Myrinet and Quadrics- Elan switches have high bandwidth and are more suitable for parallel reservoir simulations. The system based on Quadrics switch has 32 nodes of dual Intel Xeon processors interconnected with a Quadrics switch. Networks and hardware configurations of this system is shown in Fig. 2. At the time of our hardware acquisition, rack-mounted Xeon-based systems were not available. Therefore, we chose to go with a cluster of workstations based on Xeon systems. Our system consisted of 32 Dell 530 workstations each with two Xeon two GHz processors and four GB memory. One of the 32 nodes was acting as the master node. The master node was connected to the compute nodes via a gigabit switch in addition to the Quadrics network. In addition to doing computations, the master node was also acting as a file server for the compute nodes, a submission and a login node for users. Jobs were submitted from the master node using Resource Management System (RMS) software. Quadrics provides us with RMS to manage parallel tasks across multiple nodes. Every node in this cluster have access to two networks --- the compute network (Quadrics network) for communications in simulations, and the management network (Fast Ethernet) for system administration activities. The Cluster with Mericom-Myrinet switch consists of two sub-clusters, each of which has 128 compute nodes a master node, a management node and external storage. Each compute node has a Myrinet card and connects to the sub -cluster Myrinet switch. In addition, each compute node is also connected to the rack switch via Fast Ethernet. Each of the compute nodes has two 2.4 GHz Xeon processors and 4 GB memory. The master node is a two-processor server where parallel jobs are initiated. It has ten TB external storage connected to it. Details of our clusters can be found in references 10 and 11.
7 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 7 Quadrics Network Ethernet Network Serial Concentrator Workstation Quadrics Fast Ethernet Serial Workstation Workstation Fig. 2. Configuration of Cluster_M
8 8 4. Implementation of POWERS on the PC-Cluster POWERS, originally written in Connection Machine FORTRAN (CMF), was ported to platform independent architecture several years ago. The Parallelization scheme in this code supports both shared and distributed memory hardware architectures. The is done by explicit OpenMP directives and MPI calls. Ideally shared memory parallelism is triggered among multiple processors on the same node via OpenMP directives and distributed memory parallelism across multiple nodes is triggered via MPI calls (MPI can also be used for communication between processors within the same node). POWERS is a dual-parallel code, it uses shared memory programming or OpenMP directive in one spatial direction, and distributed memory programming or MPI calls in another spatial direction. These two parallelization schemes are orthogonal. This orthogonallity in parallelization works well in almost all hardware architectures. We expanded MPI parallelization in a second dimension to take full advantage of PC-Cluster technology. This allowed efficient utilization of the distributed nature of the PC -Cluster architecture, and allowed us to utilize more processors on the cluster. We went through following steps in our implementation and optimization of POWERS on our PC clusters: a) Performed straight port (recompiled the code on cluster using the Intel-ifc compiler) b) Expanded MPI parallelization in second dimension c) Benchmarked interconnect switches (Mericom-Merinet and Quadrics-Elan) using code kernels and the full code d) Revised and optimimized MPI communication algorithm in the code e) Tuned performance. Initially we ported POWERS onto a 32 node Quadrics -cluster (Cluster_Q). We encountered several problem s in straight port using the Intel compiler (v.6). PO WERS code had extensively used Fortran -95 array syntax with OpenMP directive mixed with MPI calls. Also, the code allocated large chunks of memory (at times reaching two GB per MPI process). This subjected software tools and libraries to significant stress (vendors were very helpful and were able fix bugs with their software expeditiously). Once POWERS had been functioning correctly on the cluster, the second step was undertaken. Fig. 3 shows an aerial cross-section of the computational grid and parallelliz ation scheme of the original code. The shaded squares represent a collection of cells in the simulation model. The first (I) dimension is used for MPI decomposition among the nodes, the second (J) dimension is used for OpenMP threads. This dual -decomposition divides the computational grid into I-slabs and then every I-slab is divided in the J-direction among threads between the processors within a node.
9 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 9 J-Dimension CPU 0 Threads CPU m MPI Node 0 I-Dimension Node 1 Node n Fig. 3. Parallelization scheme with I-Dimension for MPI, and J-Dimension for OpenMP Fig. 4 shows our new parallelization scheme. The computational grid is decomposed into aerial blocks and every block is assigned to particular node. Communications (such as those needed for flux calculations) between blocks are done using MPI calls. Furthermore, a node may utilize threads (in J-dimension) for computation within this block. Communication needed for flux calculations depends on the surface area of each decomposed block. Such communications are regular. One can minimize regular communication in flux calculations by choosing a decomposition which minimizes surface area of decomposed domains. There are irregular communications in POWERS for well calculations. Such communications depend on number of decomposed grid blocks (MPI processes ), wells and completions. Choice of a particular domain decomposition scheme affects amount of communications and in turn execution time. We briefly examine effects of domain decomposition or blocking in this study. Further discussion of communication issues will be covered in reference 11.
10 10 OpenMP CPU 0 CPU m MPI CPU 0 CPU m OpenMP MPI CPU 0 CPU m OpenMP Fig. 4. Parallelization scheme utilizing MPI and OpenMP 5. Results We evaluated the Quadrics and the Myricom -Myrinet switches as alternative interconnection networks. Our initial results are given in reference 10 and more detailed in reference 11. All comp utations in this study were done using 64 bit arithmetic. The performance of POWERS on a PC -Cluster depends on a number of factors such as communication to computation ratio, extent of memory protection required for multi-threading, size of the problem, amount of I/O, number of wells and completions. In this paper, we examine some optimization issues related to large scale reservoir simulation on PC clusters. We have utilized four reservoir models (Table 2) for this paper. These four models were selected to examine different computational and communicational characteristics in reservoir simulations. The first two models are large, Model I is a 3.7 million cell model with 144 wells, and Model II is a 9.6 million cell model with about 3000 wells (this is one of the larger reservoir models built in Saudi Aramco). We chose this model to identify and study issues which could be important for large scale simulation on cluster environments. Models III and IV have relatively small number of grid cells. Large numbers of wells in Model III makes it suitable for studying effects of well related computations and communications. Typical simulation result (saturation plot) of Model II is shown in Appendix A.
11 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 11 Model Grid Size (million cells) Number of w ells Years Simulated Model I Model II 9.6 2,960 5 Model III 0.2 4, Model IV Table 2. Test Cases We present timings for Model I in Table 3. Aggressive compilation flag (-O3) and other tuning options were used on both IBM NH2 and PC clusters. POWERS was compiled with the Intel compiler (v.6) on both Cluster_Q (cluster with Quadrics switch) and Cluster_M (cluster with Myrinet switch). Performances of Quadrics and Myrinet cluster s were within 20% of each other. PC Clusters were about two to three times faster than IBM-NH2 for simulations on same number of processors. Processor speed played an important role in reducing execution times on the clusters. Cluster_Q has 2 GHz and Cluster_M has 2.4 GHz processors compared to 375 MHz Power 3 processors on IBM -NH2. Later Intel compiler on Cluster_M was upgraded from v.6 to v.7. Mericom -Myrinet GM library was also upgraded at the same time when v.7 compiler was installed on Cluster_M. As shown in Fig. 5, these two upgrades reduced execution time by about 8% for 16 processors run and about 25% for 32 processor run. CPU Cluster_Q (sec) Cluster_M (sec) IBM-NH2 (sec) 16 3,900 3,200 8, ,960 2,175 4,790 Table 3. Model I Execution time on PC Clusters and IBM_NH2
12 12 Time (sec) Cluster_Q(v.6) Cluster_M(v.6) Cluster_M(v.7) NP Fig. 5. Execution time of Model I on PC Clusters We next examine timings and scalability of Model I. Our results on Cluster_M and IBM -NH2 are shown in Table 4 and Fig. 6. We present execution time (Tt) and core simulation time (Tc), which excludes initialization and I/O overheads, in Table 4, and speedups in execution time and speedup in core simulation time in Fig. 6. Simulations were made on up to 120 processors. Speed ups were comparable and generally slightly higher on the IBM-NH2 compared to those on the Cluster_M. On 96 processors, speedup on Cluster_M was slightly better than that on IBM-NH2. However, simulation time on the Cluster_M increased from 740 seconds on 96 processors to 770 seconds on 120 processors, although core simulation time decreased from 630 seconds to 520 seconds. Overhead in these two simulations jumped from about 15% on 96 processors to over 30% on 120 processors. We examine overheads on large number processor later in this study.
13 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster 13 CPU Tt (cluster) (sec.) Tc (cluster) (sec.) Tt (NH2) (sec.) Tc (NH2) (sec.) 16 2, ,960 8, , ,790 4, , ,340 3, ,650 2, ,440 2, ,245 2, ,785 1,595 Table 4. Model I Simulation time on Cluster_ M Ideal NH2 (total) NH2 (core) Cluster (total) Cluster (core) Speed up NP Fig. 6. Scalability of Model I Next we consider Model II and show our results in Table 5 on up to 120 processors. On smaller number of processors, Cluster_M was much faster than IBM -
14 14 NH2. But overheads on the Cluster were high on large number of processors. In fact, on 120 processors core simulation time was about 45% of execution time. To understand the cause for large overheads we examine timings of various sections of the code for simulations of Model I and II. In Figures 7 and 8, we show communication and synchronization overhead times along with exe cution time (Tt) and core simulation time (Tc) for these model as measured on the master node (node 0). CPU Tt (cluster) (sec.) Tc (cluster) (sec.) Tt (NH2) (sec.) Tc (NH2) (sec.) 48 4,300 3,780 11,115 10, ,590 2,815 8,960 8, ,140 2,284 6,000 5, ,190 1,864 4,415 4,180 Table 5. Simulation times of Model II on Cluster_M and IBM NH Comm Synch Core Total Time (sec) NP Fig. 7. Timing of Code Sections in Model I
15 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster Time (sec) Comm Synch Core Total NP Fig. 8. Timing of Code Sections in Model II Communication overheads of Model I did not change significantly as number of processors increased. Same was true also for Model II. On 120 processors, communication overhead was about 15% of total execution time for Model I and about 13% of total execution time for Model II. In absence of any other overhead, speedup on 120 processors would be about 100. Synchronization overheads for Model II was very high. This may have been caused by load imbalance in our simulation. We are now trying to reduce overheads by adjusting grid distribution of computing nodes. We are also examining our communication (MPI) routines and trying to reduce unnecessary barriers or blockings in data sends and receives. Next we present results with an optimized irregular communication algorithm in the code. As indicated earlier, there are two main types of communication in POWERS --- communications for calculation of spatial gradients or fluxes, and communication for mapping well data onto the spatial grid. The former is regular neighbor communication and later is irregular communication. Irregular communication becomes very important when a large number of wells is present in the reservoir. We show execution time and speedups for Model III in T able 6 and Fig. 9 respectively. We also show results on the Cluster_Q (using Intel compiler v.6) and IBM -NH2 for reference. On 20 Processors, our simulation took 8,000 seconds on the Cluster_Q and 7,700 seconds on IBM-NH2. On Cluster_M (using Intel compiler v.7) this simulation took 5,700 seconds for the original code and 4,860 seconds for the optimized code. Speedup of this model on IBM NH2 was better than those on clusters. Simulations on NH2 were done with large number of OpenMP processes and small number of MPI processes to keep amount of irregular communication low. On
16 16 clusters we had to use large number of MPI processes. In Fig. 10 we show results of Model I. T_opt/T_orig is the ratio of timings with the optimized communication algorithm to those with the original algorithm. Execution time reduced by about 25% on 120 processors when our optimized irregular communication algorithm was used. Timings of regular and irregular communications for Model III are given in Fig. 11. Improvements in communication timings with our revised code are noticeable. CPU Cluster_Q (sec.) Cluster_M(orig) (sec.) Cluster_M(opt) (s ec.) IBM NH2 (sec.) 1 56,100 45,830 45,830 98, ,900 7,350 7,090 13, ,000 5,700 4,860 7, ,230 3,720 4,675 Table 6. Model III execution time Speed up NH2 Cluster (orig) Cluster (opt) NP Fig. 9. Scalability of Model III
17 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster T_opt / T_orig Total Time Core Time NP Fig. 10. Effect of optimized communication algorithm for Model I Simulation Reg Comm (orig) Reg Comm (opt) Irreg Comm (orig) Irreg Comm (opt) Time (sec) NP Fig. 11. Communication Overhead in Model III Simulation
18 18 We next examine domain decomposition parameters. Results for Model I and IV are given in Fig. 12. Simulations were made on 32 processors. 1x32 blocking (i.e. I dimension is not decomposed and J dimension is decomposed in 32 parts in Fig. 4) was the best for Model I, and 4x8 blocking was best for Model IV. The original code had only I dimension decomposition using MPI processes (Fig. 3), which corresponds to 32x1 blocking in Fig. 12. Optimal blocking parameter depends on the reservoir model (grid parameters). Time (sec) Model I total Model I core Model IV total Model IV core 0 32x1 16x2 8x4 4x8 2x16 1x32 Blocking Fig. 12. Timing with Different Decomposition in Models I and IV 6. Conclusions Based on our observations in this work, we conclude the followings: 1. PC-Clusters are highly attractive option for parallel reservoir simulations in terms of performance and cost.
19 Large Scale Parallel Reservoir Simulations on a Linux PC -Cluster Simulations of massive reservoir models (in the order of 10 million cells) are possible at affordable computing cost also feasible within very short period. 3. Serial computation, synchronization and I/O issues become significant at high number of processors (i.e CPUs). 4. Commodity switches can be effectively used in high performance computing platform for reservoir simulation. 5. Both hardware and software tools such as compilers are constantly improving which in turn will reduce execution times and computing costs on PC-clusters. Acknowledgements Authors would like to thank Saudi Aramco management for permission to publish this paper. Authors would also like to express their sincere gratitude to N. Al-Rabeh, ECC General Manager for his continuous encouragement and support. They also would like to extend their appreciations to PEASD management, specially to R. Al- Rabeh and A. Al-Mossa, and to all members of ECC Linux Cluster team members. Nomenclature g = gravitational constant H = depth K a = relative permeability of phase a K = absolute permeability tensor P = Pressure q = well flow rate S a = Saturation of phase a t = time U = Darcy s velocity φ = porosity ρ = density µ a = phase viscosity
20 20 References 1. Dogru, A. H., Li, K. G., Sunaidi, H. A., Habiballah, W. A., Fung, L., Al_Zamil, N., and Shin, D.; A Massively Parallel Reservoir Simulator for Large Scale Reservoir Simulation, SPE paper 51886, Presented at the 1999 SPE Reservoir Simulation Symposium, February Fung, L. S. and Dogru, A. H.: Efficient Multilevel Method for Local Grid Refinement for Massively Parallel Reservoir Simulation, Paper M-28, Presented at the ECMOR VII European Conference on the Mathematics of Oil Recovery, Lago Maggiore, Italy, September Buyya, R. and Baker, M., Cluster Computing at a Glance, High Performance Cluster Computing, Vol. 1, pp 3-45, Prentice-Hall Inc., NJ, Hayder, M. E. and Jayasimha, D. N.: Navier-Stokes Simulations of Jet Flows on a Network of Workstations, AIAA Journal, 34(4), pp , Bhandankar, S. M. and Machaka, S. Chromosome Reconstruction from Physical Maps Using Cluster of Workstations, Journal of Supercomputing, 11(1), pp 61-86, Komatitsch, D. and Tromp, J.: Modeling of Seismic Wave Propagation at the Scale of the Earth on a Large Beowulf, Presented in the SC2001 ( Nov Huwaidi, M. H., Tyraskis, P. T., Khan, M. S., Luo, Y. and Faraj, S. A.: PC - Clustering at Saudi Aramco: from Concept to Reality, Saudi Aramco Journal of Technology, Spring 2003, pp Habiballah, W., Hayder, M., Uwaiyedh, A., Khan, M., Issa, K., Zahrani, S., Tyraskis, T., Shaikh, R., and Baddourah, M.: Prospects of Large Scale Reservoir Simulations at Saudi Aramco on PC Clusters, Presented at the 2003 SPE Technical Symposium of Saudi Arabia, Dhahran, Saudi Arabia, June Habiballah, W., Hayder, M., Uwaiyedh, A., Khan, M., Issa, K., Zahrani, S., Tyraskis, T., Shaikh, R., and Baddourah, M.: Parallel Reservoir Simulation Utilizing PC-Clusters in Massive Reservoirs Simulation Models, To be presented at the SPE Annual Technical Conference and Exhibition, Denver, CO, USA. October 2003.
21 21 Appendix A Fig. 13. A Saturation Plot of Model II
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. [email protected] http://www.dell.com/clustering
P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE
1 P013 INTRODUCING A NEW GENERATION OF RESERVOIR SIMULATION SOFTWARE JEAN-MARC GRATIEN, JEAN-FRANÇOIS MAGRAS, PHILIPPE QUANDALLE, OLIVIER RICOIS 1&4, av. Bois-Préau. 92852 Rueil Malmaison Cedex. France
Large-Scale Reservoir Simulation and Big Data Visualization
Large-Scale Reservoir Simulation and Big Data Visualization Dr. Zhangxing John Chen NSERC/Alberta Innovates Energy Environment Solutions/Foundation CMG Chair Alberta Innovates Technology Future (icore)
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
Cluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: [email protected] 1 Introduction and some local history High performance computing
Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Clusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
A Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1
Introduction to High Performance Cluster Computing Cluster Training for UCL Part 1 What is HPC HPC = High Performance Computing Includes Supercomputing HPCC = High Performance Cluster Computing Note: these
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
Software Development around a Millisecond
Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Gabriele Jost and Haoqiang Jin NAS Division, NASA Ames Research Center, Moffett Field, CA 94035-1000 {gjost,hjin}@nas.nasa.gov
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
A Theory of the Spatial Computational Domain
A Theory of the Spatial Computational Domain Shaowen Wang 1 and Marc P. Armstrong 2 1 Academic Technologies Research Services and Department of Geography, The University of Iowa Iowa City, IA 52242 Tel:
Performance of the JMA NWP models on the PC cluster TSUBAME.
Performance of the JMA NWP models on the PC cluster TSUBAME. K.Takenouchi 1), S.Yokoi 1), T.Hara 1) *, T.Aoki 2), C.Muroi 1), K.Aranami 1), K.Iwamura 1), Y.Aikawa 1) 1) Japan Meteorological Agency (JMA)
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
IBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: [email protected] Summary Deep Computing Visualization in Oil & Gas
Nexus. Reservoir Simulation Software DATA SHEET
DATA SHEET Nexus Reservoir Simulation Software OVERVIEW KEY VALUE Compute surface and subsurface fluid flow simultaneously for increased accuracy and stability Build multi-reservoir models by combining
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Basin simulation for complex geological settings
Énergies renouvelables Production éco-responsable Transports innovants Procédés éco-efficients Ressources durables Basin simulation for complex geological settings Towards a realistic modeling P. Havé*,
High Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS 2000 Barcelona, 11-14 September 2000 ECCOMAS DYNAMIC LOAD BALANCING APPLICATIONS ON A HETEROGENEOUS UNIX/NT CLUSTER
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters
Performance of the NAS Parallel Benchmarks on Grid Enabled Clusters Philip J. Sokolowski Dept. of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr., Detroit, MI 4822 [email protected]
PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS
PERFORMANCE CONSIDERATIONS FOR NETWORK SWITCH FABRICS ON LINUX CLUSTERS Philip J. Sokolowski Department of Electrical and Computer Engineering Wayne State University 55 Anthony Wayne Dr. Detroit, MI 822
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
White Paper The Numascale Solution: Extreme BIG DATA Computing
White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer
HPC Deployment of OpenFOAM in an Industrial Setting
HPC Deployment of OpenFOAM in an Industrial Setting Hrvoje Jasak [email protected] Wikki Ltd, United Kingdom PRACE Seminar: Industrial Usage of HPC Stockholm, Sweden, 28-29 March 2011 HPC Deployment
Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Lattice QCD Performance. on Multi core Linux Servers
Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks
Icepak High-Performance Computing at Rockwell Automation: Benefits and Benchmarks Garron K. Morris Senior Project Thermal Engineer [email protected] Standard Drives Division Bruce W. Weiss Principal
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops
Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops Greater Efficiency and Performance from the Industry Leaders Citrix XenDesktop with Microsoft
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
Parallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
Understanding the Performance of an X550 11-User Environment
Understanding the Performance of an X550 11-User Environment Overview NComputing's desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single
64-Bit versus 32-Bit CPUs in Scientific Computing
64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Flash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
Poisson Equation Solver Parallelisation for Particle-in-Cell Model
WDS'14 Proceedings of Contributed Papers Physics, 233 237, 214. ISBN 978-8-7378-276-4 MATFYZPRESS Poisson Equation Solver Parallelisation for Particle-in-Cell Model A. Podolník, 1,2 M. Komm, 1 R. Dejarnac,
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud
StACC: St Andrews Cloud Computing Co laboratory A Performance Comparison of Clouds Amazon EC2 and Ubuntu Enterprise Cloud Jonathan S Ward StACC (pronounced like 'stack') is a research collaboration launched
Intel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
UTS: An Unbalanced Tree Search Benchmark
UTS: An Unbalanced Tree Search Benchmark LCPC 2006 1 Coauthors Stephen Olivier, UNC Jun Huan, UNC/Kansas Jinze Liu, UNC Jan Prins, UNC James Dinan, OSU P. Sadayappan, OSU Chau-Wen Tseng, UMD Also, thanks
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT
numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale
